How many images and captions are required to train my own CLIP?

Zasder3 / train-CLIP

A PyTorch Lightning solution to training OpenAI's CLIP from scratch.

MIT License

653 stars 78 forks source link

Hi Yong,

Glad you like the code! The original model had 400 million samples to train for reference. You are most likely to find success by trying zero-shot classification or using linear classifier on top of an existing pretrained model.

The fine tuning code is useful to train any arbitrary models that have been pretrained. For example it's really efficient to train an RN50 trained of ImageNet1k with a pretrained BERT.

You may find success fine-tuning an existing CLIP, but it's likely the model will overfit.

All the best, Cade

Zasder3 / train-CLIP

How many images and captions are required to train my own CLIP? #12