OpenGVLab / InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Apache License 2.0
1.11k stars 73 forks source link
action-recognition benchmark contrastive-learning foundation-models instruction-tuning masked-autoencoder multimodal open-set-recognition self-supervised spatio-temporal-action-localization temporal-action-localization video-clip video-data video-dataset video-question-answering video-retrieval video-understanding vision-transformer zero-shot-classification zero-shot-retrieval

InternVideo: Video Foundation Models for Multimodal Understanding


internvideo2_performance.

This repo contains InternVideo series and related works in video foundation models.

Updates

Contact