OpenGVLab / InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Apache License 2.0
1.3k stars 85 forks source link

What differences between ViCLIP, internVideo, and internVideo2? #170

Closed harukaza closed 1 week ago

harukaza commented 1 week ago

I am confused by three model names and released times. I find that internVideo is from [2212.03191] while ViCLIP is from [2307.06942], but when I want to download the ViCLIP, the link is the internVideo page. Additionally, is the internVideo2 also a foundation model and powerful than the other two models?

yinanhe commented 1 week ago

Hi, thank you for your interest in our InternVideo series. InternVideo was first open-sourced in late 2022 as a video foundation model to explore generative and discriminative learning in the field of video representation. In addition, in 2023 we collected a large-scale video text dataset, InternVid, and trained a video-text contrast learning model ViCLIP based on Internvid. In early 2024, we released InternVideo2, focusing more on video multi-modal learning representation, and using more data and more modalities, the performance has been further improved.