How to use small size of intervideo2 clip model ??

Hello, really appreciate for your great work. https://github.com/OpenGVLab/InternVideo/blob/main/InternVideo2/multi_modality/MODEL_ZOO.md I checked that you guys wrote "We also learn a CLIP-style InternVideo2 indicated by InternVideo2clip. It is post-pretrained from InternVideo2s2 by only preserving video and text encoders and contrastive loss." in your paper.

But I found out that this model [InternVideo2-CLIP-1B-224p-f8] in huggingface is too small, like just a few MB. And according to the right before issue I noticed that that pth file in huggingface is "add on parameter", not a full parameter.

So as i understood, there might be only clip model that post trained after stage2 right? I want to know how can I initialize that clip model and utilize. I want to get clip score from that model. It would be really grateful if you let me know exact way to do that. ( It is quite confusing no matter how much time i refer to your readme and demo.ipynb file.) Thank you in advance.

OpenGVLab / InternVideo2

How to use small size of intervideo2 clip model ?? #2