The code and models for 'InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding' are scheduled to be released soon at this link. Please note that this repository will no longer receive updates or maintenance.
92.1%
Top1 accuracy in Kinetics 400.SOTA
performance on over 60
video/audio-related tasks (including action recognition, temporal localization, retrieval, etc) when released.Mar 22, 2024
: The technical report of InternVideo2 is released.