MendelXu / SAN

Open-vocabulary Semantic Segmentation
https://mendelxu.github.io/SAN/
MIT License
295 stars 27 forks source link

CLIP image encoder #40

Closed scm-later closed 8 months ago

scm-later commented 8 months ago

I have two questions. (1)At present,I have a ViT-B/32 weight that I trained by myself,but SAN's pre-training weight is ViT-B/16 How can I use my pre-training weight ViT-B/32 for SAN ? AS we all konw, official clip is trained of ViT-B/32 ViT-B/16 and ResNet (2)why doesn't the CLIP image encoder use ResNet Thanks!!

MendelXu commented 8 months ago

1) You may have to edit the config to set the clip model name and re-train your own model. 2) It is eaiser to evaluate the scaling law of vit based model as more vit based models with different size are provided.

SCM556 commented 8 months ago

Thank you for your reply