InternLM / InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Apache License 2.0
2.52k stars 154 forks source link

Two problem of evaluation #385

Open phac123 opened 3 months ago

phac123 commented 3 months ago

Hi, I have encountered the same problem. I have two questions about the evaluation:

(1) How many video frames are used when assessing performance using the 5 benchmarks, including MVBench, MLVU, MMEVideo, MMBVideo, and TempCompass? (2) How many numbers of the dynamic image partition strategy are used when assessing the 5 benchmarks? I hope to get your response soon.

Best regards.

qyx1121 commented 3 months ago

I have the same issues. I set the number of sampling frames to 16, hd_num=24, and the result on MVBench is 65.9 (v.s 69.1), while the result on MLVU is 58.3 (v.s 58.8).

phac123 commented 3 months ago

I have the same issues. I set the number of sampling frames to 16, hd_num=24, and the result on MVBench is 65.9 (v.s 69.1), while the result on MLVU is 58.3 (v.s 58.8).

Thank you for sharing!