cvlab-kaist / CAT-Seg

Official Implementation of "CAT-Seg🐱: Cost Aggregation for Open-Vocabulary Semantic Segmentation"
https://ku-cvlab.github.io/CAT-Seg/
MIT License
267 stars 25 forks source link

report bug #20

Closed SuleBai closed 7 months ago

SuleBai commented 7 months ago

Hi, thanks for your great work!

When I look through the code, I found in the inference phase, the clip image encoder is forwarded twice, is this a bug here or why is it forwarded twice?

https://github.com/KU-CVLAB/CAT-Seg/blob/3062d4abda7884f35ff8650784c882b225783978/cat_seg/cat_seg_model.py#L202

https://github.com/KU-CVLAB/CAT-Seg/blob/3062d4abda7884f35ff8650784c882b225783978/cat_seg/cat_seg_model.py#L205

Besides, the main difference between the CVPR version and the previous arxiv version is that you remove the additional backbone(Swin) and managed to finetune the CLIP text encoder, am I right?

hsshin98 commented 7 months ago

Yes this is a bug, and we fixed it - it seems like we missed it while refactoring our code. Also, you are right about the main differences, and some other differences would include fine-tuning methodology where we selectively finetune few layers within the attention layer rather than the full module.