Vchitect / VBench

[CVPR2024 Highlight] VBench - We Evaluate Video Generation
https://vchitect.github.io/VBench-project/
Apache License 2.0
602 stars 29 forks source link

The number of videos in the published sampled videos is inconsistent #84

Open YinAoXiong opened 6 days ago

YinAoXiong commented 6 days ago

When I downloaded the video files published in Google Drive for testing, I found that their file numbers were inconsistent, taking MiniMax, CogVideoX-2B and OpenSoraPlanv1-1 as examples. I used MiniMax as a benchmark and counted the file name differences between CogVideoX-2B and OpenSoraPlanv1-1 relative to MiniMax. The results are in the attachment.

The main differences are as follows:

  1. The CogVideoX-series video repeats the videos in the temporal_flickering category 25 times instead of 5 like the others.
  2. There are two duplicate prompts in the prompt, "A fantasy landscape" and "a fantasy landscape". Some will choose one, such as MiniMax, while some will use both, such as OpenSoraPlanv1-1. It seems that some will use a mixture of the two, which is a bit random.

I don't know if this difference will affect the fairness of the comparison. Perhaps we can set a standard for the number of files and file naming, and use a pre-process to check whether the submitted video meets the requirements.

MiniMax-OpenSoraPlanv1-1.txt MiniMax-CogVideoX-2B.txt

ziqihuangg commented 5 days ago

Thanks for your questions and comments.

When using VBench to calculate the Temporal Flickering of a short video, we sample 25 videos to filter out 5 static ones. For long videos evaluated with VBench-Long, each long video is sliced into shorter 2-second clips. In this case, only the first 5 short static clips are used for scoring, and these clips are exclusively derived from the first 5 long videos. Thus, sampling the first 5 long videos is sufficient. The previous requirement of 25 videos was redundant but does not affect the reproducibility of the evaluation results.

Regarding the prompts "A fantasy landscape" and "a fantasy landscape," we sample videos for each prompt separately and calculate scores independently. However, a case sensitivity issue during the upload process led to conflicting file names, resulting in missing videos. We are re-uploading the missing videos. Thanks for your patience!