Hi Thanks for sharing, the code is neat and easy to follow. I have one question regarding fine-tuning a pre-trained CLIP.
I notice that in your train_finetune.py, instead of directly loaded a pre-trainend CLIP model you encode two separately defined image encoder and text encoder. I wonder if I want to fine-tune a specific pre-trained CLIP model such as "ViT/32B", how I can properly load the image encoder and text encoder? Thank you for your answer.
Hi Thanks for sharing, the code is neat and easy to follow. I have one question regarding fine-tuning a pre-trained CLIP. I notice that in your train_finetune.py, instead of directly loaded a pre-trainend CLIP model you encode two separately defined image encoder and text encoder. I wonder if I want to fine-tune a specific pre-trained CLIP model such as "ViT/32B", how I can properly load the image encoder and text encoder? Thank you for your answer.