facebookresearch / ov-seg

This is the official PyTorch implementation of the paper Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP.
Other
676 stars 61 forks source link

Use which feature to classify for demo. #12

Closed synsin0 closed 1 year ago

synsin0 commented 1 year ago

Thanks for your great work. I see that for the demo config, the mask is from ov-seg, but the classification is completely dependent on clip classification (L486: # only clip model predictions are used). At table 5, I may understand using each feature(either from ovseg and clip) is able to classify. However, if I turned clip_ensemble to False, the pred picture become totally wrong. Does the ov-seg only produces mask proposals for clip adapter in the demo? How to use ovseg feature only for mask classification?

Jeff-LiangF commented 1 year ago

Hi @synsin0,

Yes, we only use ov-seg (MaskFormer) to produce mask proposals, leaving its class predictions unused in demo. The reason is, like you also mentioned, the performance would become worse if we use it. We conjecture this is because the open-vocabulary classifier of ov-seg (MaskFormer) is trained with COCO-171, resulting it fitting to these 171 classes while being unable to handle the diverse cases in the demo.

If you only want to use ov-seg (MaskFormer) class prediction (The MaskFomer only results in Table 5), you may want to turn CLIP_ENSEMBLE to False as in here.

I close this issue, feel free to reopen it if you have further questions.