OFA-Sys / ONE-PEACE

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Apache License 2.0
935 stars 57 forks source link

How to segment random image files using a pretrained model? #15

Closed hotohoto closed 1 year ago

hotohoto commented 1 year ago

I'm trying to use the pretrained segmentation model just for inference, so I'm running a command something like this.

cd one_peace_vision/seg
python test.py configs/ade20k/mask2former_onepeace_adapter_g_896_40k_ade20k_ms.py ../one_peace_vision.pt --show

(I'm not sure if this is the right script that I need to use, but at least I want to check I'm using the right checkpoint. If I need to write an inference script I might be able to write one following https://github.com/open-mmlab/mmsegmentation/blob/0.x/docs/en/get_started.md#verify-the-installation and also referring to one_peace_vision/seg/test.py.)

Anyway I need some help and here are the questions.

What is https://one-peace-shanghai.oss-accelerate.aliyuncs.com/one_peace_checkpoints/one_peace_vision.pt for? It look slike almost all the keys are missing when loading the checkpoint. So I tried to convert it and load it again, but ended up with Invalid magic number; corrupt file?. Which one am I supposed to use?

In the mean time, I found the comment below in #14.

Note that the one_peace_vision.pth is different from one-peace-vision.pkl in model names.

But I don't understand what exactly mean. I'm not super clear on what's going on with all the checkpoints and their convsersion. I believe onepeace_seg_cocostuff2ade20k.pth is not a file I can download.

Thanks in advance. 🙂

logicwong commented 1 year ago

What is https://one-peace-shanghai.oss-accelerate.aliyuncs.com/one_peace_checkpoints/one_peace_vision.pt for? It look slike almost all the keys are missing when loading the checkpoint. So I tried to convert it and load it again, but ended up with Invalid magic number; corrupt file?. Which one am I supposed to use?

Since ONE-PEACE is a sparse model with different parts dedicated to different modalities, it can be disassembled into different branches to handle specific tasks. https://one-peace-shanghai.oss-accelerate.aliyuncs.com/one_peace_checkpoints/one_peace_vision.pt is the Vision-Branch that only contains the vision adapter, self-attention layers, and vision FFNs. (Language and audio parts, such as language FFNs, have been manually removed from http://one-peace-shanghai.oss-accelerate.aliyuncs.com/one-peace.pt). It seems that these branches may cause confusion, we plan to remove these checkpoints.

For the semantic segmentation task, you can refer to one_peace_vision.md and seg.md.

simonJJJ commented 1 year ago

Hi @hotohoto, we have released the segmentation checkpoint here and also one-peace-vision.pkl here.

You can use the segmentation checkpoint to segment random image files.

Also, one-peace-vision.pkl is only different from one_peace_vision.pth in model names.

hotohoto commented 1 year ago

Thank you for the support! I'll take a look at them sooner or later.