Closed harukaza closed 2 months ago
Hi, thank you for your interest in our InternVideo series. InternVideo was first open-sourced in late 2022 as a video foundation model to explore generative and discriminative learning in the field of video representation. In addition, in 2023 we collected a large-scale video text dataset, InternVid, and trained a video-text contrast learning model ViCLIP based on Internvid. In early 2024, we released InternVideo2, focusing more on video multi-modal learning representation, and using more data and more modalities, the performance has been further improved.
I am confused by three model names and released times. I find that internVideo is from [2212.03191] while ViCLIP is from [2307.06942], but when I want to download the ViCLIP, the link is the internVideo page. Additionally, is the internVideo2 also a foundation model and powerful than the other two models?