ZiqinZhou66 / ZegCLIP

Official implement of CVPR2023 ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation
MIT License
195 stars 17 forks source link

Specification request about the definintion of inductive setting #17

Open cuttle-fish-my opened 11 months ago

cuttle-fish-my commented 11 months ago

Thanks to your marvelous work on open-vocabulary segmentation, I'm very interested in this project. However, I am confused about the setting of inductive open-vocabulary segmentation. Especially, in "A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future", the definition of "inductive" is "training images do not contain any unseen objects even if they are unannotated", which means both pixels and text of unseen objects is forbidden during training. But in this work, "inductive" means "the names of unseen classes in inference are unavailable while training " and I can't find the corresponding code that sieves off the unseen pixels. So, I get a little confused. To summarize my question into an example: if "Human" is defined as a seen class and "Dog" is an unseen class, then whether an image containing a man and a dog can be used for training? Thanks in advance and hope for your reply!