OpenGVLab / InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Apache License 2.0
1.29k stars 83 forks source link

Zero-shot retrieval reproduction issue #112

Open jqsun98 opened 4 months ago

jqsun98 commented 4 months ago

According to the ReadMe at https://github.com/OpenGVLab/InternVideo/tree/main/InternVideo1/Downstream/Video-Text-Retrieval, the zero-shot retrieval results will be obtained after running the command ./zeroshot_scripts/eval_msrvtt.sh. This command will execute the main_task_retrieval.py. But in "main_task_retrieval.py", I find that the model is CLIP4CLIP, instead of ViCLIP. I'd like to know how to conduct zero-shot video-text retrieval experiments with pretrained ViCLIP.

leexinhao commented 4 months ago

Maybe you need to use the code of Internvideo2.mulitidality and add a model defintion of ViCLIP.