About the pre-trained CLIP model

X-PLUG / Youku-mPLUG

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks

Apache License 2.0

283 stars 11 forks source link

About the pre-trained CLIP model #26

Open jacqueline-weng opened 8 months ago

jacqueline-weng commented 8 months ago

The code shows it loads the visual encoder from a CLIP model (clip-vit-b16.pth). I did not find anything mentioned where it comes from. I tried to load clip-vitb16 from OpenAI huggingface, but it has unmatched keys when loading. Is OpenAI's CLIP the required or you have your own trained CLIP?

zhiweibi commented 5 months ago

Hi, do you find the model file?