MendelXu / SAN

Open-vocabulary Semantic Segmentation
https://mendelxu.github.io/SAN/
MIT License
307 stars 28 forks source link

what's the difference between prepare_voc_sem_seg.py processed and before? #48

Closed ChunmengLiu1 closed 8 months ago

ChunmengLiu1 commented 8 months ago

Hi! Sorry for bothering you again. Does prepare_voc_sem_seg.py process the data any differently than before? It looks like they changed the value of the tag. Is this due to not calculating the background class during testing? Thank you!

ChunmengLiu1 commented 8 months ago

Another problem is: you don't use pre-trained model in the training stage. So you just train side adapter network during the train step? After that, you use frozen CLIP to infer? I want to know that how you used the frozen CLIP in the train and infer stage. Thanks for your response!

MendelXu commented 8 months ago

Hi! Sorry for bothering you again. Does prepare_voc_sem_seg.py process the data any differently than before? It looks like they changed the value of the tag. Is this due to not calculating the background class during testing? Thank you!

The main difference is:

MendelXu commented 8 months ago

Another problem is: you don't use pre-trained model in the training stage. So you just train side adapter network during the train step? After that, you use frozen CLIP to infer? I want to know that how you used the frozen CLIP in the train and infer stage. Thanks for your response!

There are two part in the model, a clip model and a side adapter. The CLIP model always loads pretrained weights while the side adapter is randomly initialized.

ChunmengLiu1 commented 8 months ago

Hi! Sorry for bothering you again. Does prepare_voc_sem_seg.py process the data any differently than before? It looks like they changed the value of the tag. Is this due to not calculating the background class during testing? Thank you!

The main difference is:

* All class ids are 1 lower.

* Background class id is converted to 255, which is ignored label during training and testing.

Got it! Thanks!

ChunmengLiu1 commented 8 months ago

Another problem is: you don't use pre-trained model in the training stage. So you just train side adapter network during the train step? After that, you use frozen CLIP to infer? I want to know that how you used the frozen CLIP in the train and infer stage. Thanks for your response!

There are two part in the model, a clip model and a side adapter. The CLIP model always loads pretrained weights while the side adapter is randomly initialized.

But I see this information:WARNING:timm.models._builder:No pretrained configuration specified for vit_tiny_patch16_224_in21k model. Using a default. Please add a config to the model pretrained_cfg registry or pass explicitly. No checkpoint found. Initializing model from scratch Is this because I didn't use pretrained weights in side adapter or CLIP model? And, I don't download the CLIP pretrained weights. Is it download automatically?Where is the position? Thanks!

MendelXu commented 8 months ago

Yes, it is becuase you didn't use pretrained weights for side adapter. CLIP pretrained weights is downloaded automatically by setting https://github.com/MendelXu/SAN/blob/81a9a2bd79d433292d46cfa0597caea5005e0116/san/model/san.py#L102

ChunmengLiu1 commented 8 months ago

Yes, it is becuase you didn't use pretrained weights for side adapter. CLIP pretrained weights is downloaded automatically by setting

https://github.com/MendelXu/SAN/blob/81a9a2bd79d433292d46cfa0597caea5005e0116/san/model/san.py#L102

Thank you for your help! And I find the pretrained weights at /home/user/.cache/clip/ViT-B-16.pt. Happy new year!