OpenGVLab / InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Apache License 2.0
1.31k stars 85 forks source link

zeroshot video-retrieval #88

Closed 1240446371 closed 5 days ago

1240446371 commented 6 months ago

Thank you for your work! But I have a question about zero shot video-retrieval task on activitynet dataset, which pretrain model I should use to reproduce the performance?Is Clip ViT-L-14.pt? Thank you for your response!

shepnerd commented 5 days ago

Apologies for the delayed response. In InternVideo1, we utilize CLIP-VIT for pretraining, whereas in InternVideo2, we train the vision model from scratch.