BradyFU / Video-MME

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
401 stars 12 forks source link

Aligning Question Count with eval_your_results.py #23

Closed JinhuiYE closed 2 months ago

JinhuiYE commented 2 months ago

Hi, thanks for the amazing benchmark.

I have some questions regarding the https://github.com/thanku-all/parse_answer/blob/main/eval_your_results.py script.

In the script, you check whether each duration has 300 questions:

assert len(your_results_video_type) == 300, f"Number of files in {video_type} is not 300. Check if there are missing files."

However, in the dataset "https://huggingface.co/datasets/lmms-lab/Video-MME", there are 2700 questions—900 for each duration (i.e., short, medium, long).

How can I ensure consistency between my results and the reported results?

JinhuiYE commented 2 months ago

oh, I got it. This is due to the index structures of these two files are different.