fine-tuning clip text encoder?

KU-CVLAB / CAT-Seg

Official Implementation of "CAT-Seg🐱: Cost Aggregation for Open-Vocabulary Semantic Segmentation"

https://ku-cvlab.github.io/CAT-Seg/

MIT License

247 stars 25 forks source link

fine-tuning clip text encoder? #19

Closed lchen1019 closed 5 months ago

lchen1019 commented 5 months ago

The framework image shows that the text encoder has been fine-tuned in , but not in the code. Is this picture drawn wrong?

Seokju-Cho commented 5 months ago

The code is outdated. Please stay tuned for updates!

hsshin98 commented 5 months ago

Hi, we just updated the code, and you can check that the same line is updated as "transformer" from "visual", which takes effect on both transformer encoders of CLIP. Please let us know if you have further questions about the code!

lchen1019 commented 5 months ago

Hi, we just updated the code, and you can check that the same line is updated as "transformer" from "visual", which takes effect on both transformer encoders of CLIP. Please let us know if you have further questions about the code!

That's great! Thank you for your excellent work.