Zasder3 / train-CLIP

A PyTorch Lightning solution to training OpenAI's CLIP from scratch.
MIT License
653 stars 78 forks source link

How many images and captions are required to train my own CLIP? #12

Closed gunwooYong closed 3 years ago

gunwooYong commented 3 years ago

Hello, I am Yong, a computer vision researcher.

I was impressed about your code and wondered about how to fine-tune CLIP. I want to classify images through CLIP. I only have at most 10 images per class; there are 4-6 classes in total.

In this situation, fine-tuning is possible?

Thank you.

Zasder3 commented 3 years ago

Hi Yong,

Glad you like the code! The original model had 400 million samples to train for reference. You are most likely to find success by trying zero-shot classification or using linear classifier on top of an existing pretrained model.

The fine tuning code is useful to train any arbitrary models that have been pretrained. For example it's really efficient to train an RN50 trained of ImageNet1k with a pretrained BERT.

You may find success fine-tuning an existing CLIP, but it's likely the model will overfit.

All the best, Cade