Zasder3 / train-CLIP

A PyTorch Lightning solution to training OpenAI's CLIP from scratch.
MIT License
653 stars 78 forks source link

Image encoder #27

Closed typercast closed 2 years ago

typercast commented 2 years ago

Is it possible to use a pre-trained image model from Hugging Face when trying to fine-tune? The latest models are usually there, so it would be pretty cool if it was compatible.

Zasder3 commented 2 years ago

It should be so long as inference functions like any a normal nn.Module. Give it a try and alter the final embedding layer to be the same as your text encoder and tell me how it goes!

I'll reopen this if you run into any problems. :)