Support ViT-B/32 of Vision Model

FreddeFrallan / Multilingual-CLIP

OpenAI CLIP text encoders for multiple languages!

MIT License

765 stars 72 forks source link

Support ViT-B/32 of Vision Model #2

Closed tommy19970714 closed 3 years ago

tommy19970714 commented 3 years ago

Incredible work done here - congratulations on the fantastic results. Do you have plan to support ViT-B/32 of Vision Model?

FreddeFrallan commented 3 years ago

Thank you, it's funny how many people seem to find this weekend hack useful! I could re-run an experiment for the ViT-B/32, is there any specific language you had in mind?

tommy19970714 commented 3 years ago

I want to combine Multilingual-CLIP with StyleCLIP. It only supports English now. If we use Multilingual-CLIP, we can edit faces with several languages. However, StyleCLIP only supports ViT-B/32.

FreddeFrallan commented 3 years ago

Got it! I'll start a training for the ViT-B/32 this weekend.

FreddeFrallan commented 3 years ago

I have now uploaded the multilingual text encoder for ViT-B/32. As for all the other models, I have not tested the quality!

Hope you find it useful :)

tommy19970714 commented 3 years ago

Awesome! Thank you very much! I will try text encoder of ViT-B/32.