KU-CVLAB / CAT-Seg

Official Implementation of "CAT-Seg🐱: Cost Aggregation for Open-Vocabulary Semantic Segmentation"
https://ku-cvlab.github.io/CAT-Seg/
MIT License
247 stars 25 forks source link

Unable to reproduce the results with CLIP frozen #25

Closed Gabrysse closed 4 months ago

Gabrysse commented 4 months ago

Hi, I tried to train your model with CLIP frozen. In Table 6 of your paper, you reported that the model achieved 10.4 on ADE847. Unfortunately, after training your model with CLIP frozen, I obtained ~4.6 mIoU on ADE847. I'm using the following command to do the training:

sh run.sh configs/vitb_384.yaml 2 output_noFT/ MODEL.SEM_SEG_HEAD.CLIP_FINETUNE None

Am I missing something?

hsshin98 commented 4 months ago

Hi, The scores on the paper are based on ViT-L/14@336px, and all other experiments on the paper are with the ViT-L variant without further specification. You might want to try with ViT-L with its config to reproduce the results in the paper, and it's hard for us to validate your results as we haven't tried with ViT-B.

Thanks!