Open dwsmart32 opened 6 months ago
For your question, we only finetuned the AttentionPool
in the vision encoder for CLIP
model. And the main parameters are not updated.
Please check the zero-shot evaluation code for CLIP
to load the model. Here are the scripts.
Thanks for your reply. Then you mean I can use clip when at least two components get ready which are Internvideo2-s2 parameter(main parameter which has not been updated) and Internvideo2-clip(additional small parameter), right?
It would be really grateful if you let me know when are you guys going to update main parameter approximately.
I m looking forward to utilize your model to my work.
Appreciate for your great work once again. @Andy1621
Yes! Currently, we do not plan to update the main parameter, as I have tried to updated more parameters, but it lead to poorer performance, which may be caused by limited post-training datasets.
Hello, really appreciate for your great work. https://github.com/OpenGVLab/InternVideo/blob/main/InternVideo2/multi_modality/MODEL_ZOO.md I checked that you guys wrote "We also learn a CLIP-style InternVideo2 indicated by InternVideo2clip. It is post-pretrained from InternVideo2s2 by only preserving video and text encoders and contrastive loss." in your paper.
But I found out that this model [InternVideo2-CLIP-1B-224p-f8] in huggingface is too small, like just a few MB. And according to the right before issue I noticed that that pth file in huggingface is "add on parameter", not a full parameter.
Thank you in advance.