Aligning Question Count with eval_your_results.py

Hi, thanks for the amazing benchmark.

In the script, you check whether each duration has 300 questions:

assert len(your_results_video_type) == 300, f"Number of files in {video_type} is not 300. Check if there are missing files."

However, in the dataset "https://huggingface.co/datasets/lmms-lab/Video-MME", there are 2700 questions—900 for each duration (i.e., short, medium, long).

How can I ensure consistency between my results and the reported results?

BradyFU / Video-MME