Vchitect / VBench

[CVPR2024 Highlight] VBench - We Evaluate Video Generation
https://vchitect.github.io/VBench-project/
Apache License 2.0
605 stars 29 forks source link

score range for each dimension? #28

Open rebuttalpapers opened 6 months ago

rebuttalpapers commented 6 months ago

Currently Vbench can evaluate on the list of dimension ['subject_consistency', 'background_consistency', 'temporal_flickering', 'motion_smoothness', 'dynamic_degree', 'aesthetic_quality', 'imaging_quality', 'object_class', 'multiple_objects', 'human_action', 'color', 'spatial_relationship', 'scene', 'temporal_style', 'appearance_style', 'overall_consistency']

I run some of them on my customized video, however, the score range for each of the dimension is different.

For one of my videos, It has 5 scores subject consistency: 10.982122957706451 motion_smoothness: 0.9960492387493192 dynamic degree: false aesthetic_quality: 0.6582092642784119 imaging_quality: 72.89873886108398

while the overall score is as following subject consistency: 0.9861730885776606 motion_smoothness: 0.9909714810295909 dynamic degree: 0.16666666666666666 aesthetic_quality: 0.6556713245809078 imaging_quality: 0.7093512528141342

Can you illustrate more on the score range for each video and what does it mean? It would be better to generate some example for score anchor for each dimension. For example, if the range of aesthetic_quality is 0-1. I would like to know how 0.1, 0.5 and 0.9 look like separately. Thanks!

ziqihuangg commented 6 months ago

Hi, thanks for your question! The score range for all dimensions are 0 to 1.

For samples of different scores, at different dimensions, you can refer to section G of supplementary materials: https://arxiv.org/pdf/2311.17982, where for each dimension we provided some samples at varying scores.

rebuttalpapers commented 6 months ago

Thanks Ziqi for your helpful answer. May I ask why we have dynamic degree as false, while subject consistency & imaging_quality much larger than 1?

ziqihuangg commented 5 months ago

Hi, I assume you are asking about the scores for individual videos in the generated eval_results.json file.

For dynamic_degree, each video undergoes binary classification, with true referring to dynamic, while false referring to static. The final score for the dynamic_degree dimension is defined as the percentage of videos classified as dynamic.

For other dimensions that have values larger than 1, it could be due to these two reasons: (1) The individual videos' raw score is in the range of 0-100. (2) The individual video's raw score hasn't been divided by the frame count yet.

We retain these raw scores for individual videos in case users need them for debugging. However, you should refer to the final aggregated score for each dimension to assess the model's performance in that particular dimension.

Jason-xin commented 5 months ago

Hi, I assume you are asking about the scores for individual videos in the generated eval_results.json file.

For dynamic_degree, each video undergoes binary classification, with true referring to dynamic, while false referring to static. The final score for the dynamic_degree dimension is defined as the percentage of videos classified as dynamic.

For other dimensions that have values larger than 1, it could be due to these two reasons: (1) The individual videos' raw score is in the range of 0-100. (2) The individual video's raw score hasn't been divided by the frame count yet.

We retain these raw scores for individual videos in case users need them for debugging. However, you should refer to the final aggregated score for each dimension to assess the model's performance in that particular dimension.

how can I get the original classification probability of dynamic_degree?

ziqihuangg commented 4 months ago

Hi, it's not probability-based classification, but based on threshold.

Nikitosina commented 3 months ago

Hi, it's not probability-based classification, but based on threshold.

Hello! Sorry, but I still did not get how to get a numeric value instead of boolean? Boolean seems not to be useful for my evaluations