OpenGVLab / Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
https://vchat.opengvlab.com/
MIT License
3k stars 247 forks source link

VideoChat2: release any other video/image encoders? #145

Closed dragen1860 closed 6 months ago

dragen1860 commented 6 months ago

Hi, Dear author: The video encoder used in videochat2 is UMT, which is fixed by 8 frames only. Would kindly release other video/image encoders which many support dynamic frames numbers, such as 4, 24, 32. An flexible image encoder pretrained on videochat2 would be very useful for rich down-stream tasks. Thank you very much.

Andy1621 commented 6 months ago

Hi! Actually, our models support different frame numbers via interpolating position embedding. Please check https://github.com/OpenGVLab/Ask-Anything/blob/078540aaebfbe1ad9a109020a73b0ce173b355ef/video_chat2/conversation.py#L182