Closed dragen1860 closed 6 months ago
Hi! Actually, our models support different frame numbers via interpolating position embedding. Please check https://github.com/OpenGVLab/Ask-Anything/blob/078540aaebfbe1ad9a109020a73b0ce173b355ef/video_chat2/conversation.py#L182
Hi, Dear author: The video encoder used in videochat2 is UMT, which is fixed by 8 frames only. Would kindly release other video/image encoders which many support dynamic frames numbers, such as 4, 24, 32. An flexible image encoder pretrained on videochat2 would be very useful for rich down-stream tasks. Thank you very much.