How are GPT4V/Gemini models evaluated?

OpenGVLab / Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

https://vchat.opengvlab.com/

MIT License

2.85k stars 230 forks source link

How are GPT4V/Gemini models evaluated? #129

Closed Xarangi closed 4 months ago

Xarangi commented 4 months ago

I notice the paper (VideoChat2) points out that GPT4V uses 16 frames (which I assume are sampled uniformly). However, how is this input into the model given it accepts only single images at a time? Is there some sample prompt for this?

yinanhe commented 4 months ago

For fairness, we input all video models with a uniform sampling of 16 frames into the model. You can refer to the official cookbook for examples: OpenAI Cookbook.

Xarangi commented 4 months ago

Thank you! That's really helpful