MendelXu / SAN

Open-vocabulary Semantic Segmentation
https://mendelxu.github.io/SAN/
MIT License
295 stars 27 forks source link

Zero-shot setting #11

Closed NanAlbert closed 1 year ago

NanAlbert commented 1 year ago

Hello. Thanks for your fantastic work. I notice that your work SimSeg in eccv2022 considered both zero-shot setting and cross-dataset setting. How does the SAN perform on zero-shot setting?

MendelXu commented 1 year ago

We didn't evaluate SAN under the zero-shot setting. I think the zero-shot setting defined in previous papers is not the proper way to extend and explore the power of the foundation models like CLIP.

NanAlbert commented 1 year ago

Thanks for the answer. Could you please explain in more detail? The zero-shot setting considers unknown categories, is this more reflective of the generalization ability of large language models?

MendelXu commented 1 year ago

Yes, I think it is right that a fair enough zero-shot benchmark can better reflect the generalization ability of the foundation models. However, in the previous zero-shot setting, the 'unknown categories' may not be unknown to the foundation models. Many categories have been exposed to the model during pretraining. Besides, due to the design of intra-dataset split, it is easy to leak the information of the unknown categories to the model, which I think is the main reason of why metric learning and self training helps a lot. Actually, I think it will be helpful if someone could make a clear analysis on the generalization evaluation.