Vchitect / VBench

[CVPR2024 Highlight] VBench - We Evaluate Video Generation
https://vchitect.github.io/VBench-project/
Apache License 2.0
482 stars 23 forks source link

Cogvideo2B score #61

Open CacacaLalala opened 3 weeks ago

CacacaLalala commented 3 weeks ago

Hi, I see that the total score of cogvideo2B on Leaderboard is 80.94%, but after I use all_dimension_long. txt to inference, the total score measured is only 78.68%. The video I produced with cogvideo2B was 8 frame rate, 6s length, and 480x720 resolution. May I ask why my test result is so much lower than that on the leaderboard? Looking forward to your reply,thanks a lot

ziqihuangg commented 3 weeks ago

Could you provide the details of the model checkpoint and sampling setting?

CacacaLalala commented 3 weeks ago

Could you provide the details of the model checkpoint and sampling setting?

Model weights are download from https://huggingface.co/THUDM/CogVideoX-2b/tree/main here Inference code is inference/cli_demo.py from Cogvideo2B repo And sampling setting is the default setting, I only change some dir path

DZY-irene commented 3 weeks ago

Could you provide the details of the model checkpoint and sampling setting?

Model weights are download from https://huggingface.co/THUDM/CogVideoX-2b/tree/main here Inference code is inference/cli_demo.py from Cogvideo2B repo And sampling setting is the default setting, I only change some dir path

Hello, here are our settings for sampling CogVideoX-2B (last line): https://github.com/Vchitect/VBench/tree/master/sampled_videos#what-are-the-details-of-the-video-generation-models. We use the SAT weights to sample videos for evaluation.

DZY-irene commented 3 weeks ago

Could you provide the details of the model checkpoint and sampling setting?

Model weights are download from https://huggingface.co/THUDM/CogVideoX-2b/tree/main here Inference code is inference/cli_demo.py from Cogvideo2B repo And sampling setting is the default setting, I only change some dir path

And for evaluation, we use the VBench-long code to evaluate sampled videos.

CacacaLalala commented 2 weeks ago

Hi, I also tried the sat weights to sample videos, and got a new result 79.75% which is still much lower than the report result. For evaluation, I still use the old evaluation code, could these cause problem? Since the only difference between using the longer txt and the original is the length of the prompt

ziqihuangg commented 2 weeks ago

What prompt list did you use?

CacacaLalala commented 2 weeks ago

https://github.com/Vchitect/VBench/blob/master/prompts/gpt_enhanced_prompts/all_dimension_longer.txt this one