Closed NanAlbert closed 1 year ago
We didn't evaluate SAN under the zero-shot setting. I think the zero-shot setting defined in previous papers is not the proper way to extend and explore the power of the foundation models like CLIP.
Thanks for the answer. Could you please explain in more detail? The zero-shot setting considers unknown categories, is this more reflective of the generalization ability of large language models?
Yes, I think it is right that a fair enough zero-shot benchmark can better reflect the generalization ability of the foundation models. However, in the previous zero-shot setting, the 'unknown categories' may not be unknown to the foundation models. Many categories have been exposed to the model during pretraining. Besides, due to the design of intra-dataset split, it is easy to leak the information of the unknown categories to the model, which I think is the main reason of why metric learning and self training helps a lot. Actually, I think it will be helpful if someone could make a clear analysis on the generalization evaluation.
Hello. Thanks for your fantastic work. I notice that your work SimSeg in eccv2022 considered both zero-shot setting and cross-dataset setting. How does the SAN perform on zero-shot setting?