Open YinAoXiong opened 6 days ago
Thanks for your questions and comments.
When using VBench to calculate the Temporal Flickering
of a short video, we sample 25 videos to filter out 5 static ones. For long videos evaluated with VBench-Long, each long video is sliced into shorter 2-second clips. In this case, only the first 5 short static clips are used for scoring, and these clips are exclusively derived from the first 5 long videos. Thus, sampling the first 5 long videos is sufficient. The previous requirement of 25 videos was redundant but does not affect the reproducibility of the evaluation results.
Regarding the prompts "A fantasy landscape" and "a fantasy landscape," we sample videos for each prompt separately and calculate scores independently. However, a case sensitivity issue during the upload process led to conflicting file names, resulting in missing videos. We are re-uploading the missing videos. Thanks for your patience!
When I downloaded the video files published in Google Drive for testing, I found that their file numbers were inconsistent, taking MiniMax, CogVideoX-2B and OpenSoraPlanv1-1 as examples. I used MiniMax as a benchmark and counted the file name differences between CogVideoX-2B and OpenSoraPlanv1-1 relative to MiniMax. The results are in the attachment.
The main differences are as follows:
I don't know if this difference will affect the fairness of the comparison. Perhaps we can set a standard for the number of files and file naming, and use a pre-process to check whether the submitted video meets the requirements.
MiniMax-OpenSoraPlanv1-1.txt MiniMax-CogVideoX-2B.txt