OpenGVLab / InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Apache License 2.0
1.44k stars 88 forks source link

Can stage-3 training further improve the performance of InternVideo2 on basic video tasks #176

Open fushh opened 2 months ago

fushh commented 2 months ago

Thanks for the great work! In stage 3, the video encoder is updated to improve its support for video-centric dialogue. Will stage 3 training affect the performance on basic video tasks? Any comparisons like Table 4 is expected. image