LukasHaas / PIGEON

Code for the CVPR 2024 paper highlight and demo "PIGEON: Predicting Image Geolocations".
https://lukashaas.github.io/PIGEON-CVPR24/
Other
99 stars 10 forks source link

Pretraining question #3

Closed mikelee-dev closed 1 month ago

mikelee-dev commented 3 months ago

Hi! very nice repository and paper. I have a question about the continued pretraining. Were all 428M parameters of OpenAI's CLIP ViT L/14 336 trainable for your continued pretraining? Or were some of them frozen?

Based on the paper, it seems like a batch size of 32 image/text pairs fit into 80GB of GPU memory, so I wanted to double check how many parameters of the original backbone were frozen/trainable. Thanks!

mikelee-dev commented 3 months ago

Specifically, I was curious of you trained this layer of the vision encoder for the continued pretraining: vision.vision_model.embeddings.class_embedding

As training this layer redefine's CLIP's classifications in embedding space

LukasHaas commented 1 month ago

Thanks for your question. Yes, all layers of CLIP ViT L/14 336 were trained during our continued pretraining.