Vchitect / VBench

[CVPR2024 Highlight] VBench - We Evaluate Video Generation
https://vchitect.github.io/VBench-project/
Apache License 2.0
374 stars 14 forks source link

Request for detailed instructions for submitting evaluation results to the leaderboard #30

Closed Ji4chenLi closed 1 month ago

Ji4chenLi commented 1 month ago

Hi,

Thank you so much for your efforts in putting together the comprehensive benchmarks!

Could you provide detailed instructions for submitting the evaluation results? I obtained 16 *eval_results.json after evaluating through all the dimensions. But it seems that I cannot submit these individual json file to the leaderboard.

Thanks, Jiachen

yinanhe commented 1 month ago

Hi, thank you for your interest in our work. Simply package the JSON file generated by VBench into a zip file and upload it directly.

Ji4chenLi commented 1 month ago

Hi Yinan,

Thank you for your response. I tried your suggestion, but I'm still unable to upload the .zip file. Specifically, I zip all *eval_results.json into a .zip file before uploading it.

Can you look into the issue?

yinanhe commented 1 month ago

Hi Yinan,

Thank you for your response. I tried your suggestion, but I'm still unable to upload the .zip file. Specifically, I zip all *eval_results.json into a .zip file before uploading it.

Can you look into the issue?

I apologize, we have just inspected the server and found that there was an issue with our processing logic. The problem has been fixed now, thank you for your feedback! Please note that you need to upload a zip file, and the first-level directory inside the zip should contain all the *result.json files. Do not modify the JSON content outputted by vbench. Don't worry about extra files in the zip, they will not be counted in the Leaderboard.

Ji4chenLi commented 1 month ago

I just had another try, but I still failed to upload any files to the leaderboard. Could you take another look?

yinanhe commented 1 month ago

I just had another try, but I still failed to upload any files to the leaderboard. Could you take another look?

@Ji4chenLi I noticed that T2V-Turbo (VC2) has been successfully submitted to the leaderboard, but only "scene" and "color" have been submitted, is that correct?

image

Ji4chenLi commented 1 month ago

The submission is actually not expected. I submitted one or two JSON files for debugging purposes but still failed to submit the entire zip file. If you have time, could you jump into a quick chat and fix the bugs? Or I can directly send you my zip file.

yinanhe commented 1 month ago

@ziqihuangg has shared your zip file in the email. I have made the necessary code adjustments to accommodate this situation. Could you please try again?

Ji4chenLi commented 1 month ago

Thank you, Yinan! My submission seems successful, but the Total Score, Quality Score, and Semantic Score are different from my calculation. I will look into it.

Ji4chenLi commented 1 month ago

I might have found the bugs. The leaderboard somehow flips scores of aesthetic quality and dynamic degree of my model. As the dynamic degree uses a different weight (0.5) to calculate the Quality Score, my results end up to be lower than expected.

Could you please help me fix that? My model T2V-Turbo (VC2) should aesthetic quality = 63.04 and dynamic degree = 49.17.

ziqihuangg commented 1 month ago

Hi @Ji4chenLi , our calculation of total scores applies per-dimension normalisation and weighting. These constant parameters can be found here: https://huggingface.co/spaces/Vchitect/VBench_Leaderboard/blob/main/constants.py

Ji4chenLi commented 1 month ago

Hi Ziqi,

I understand it. I have carefully followed your codes to do the calculation. The thing is that currently the aesthetic quality and dynamic degree of my model get flipped on the leaderboard, leading to worse Quality Score and Total Score of my model T2V-Turbo (VC2).

image

image
ziqihuangg commented 1 month ago

We've swapped the values of these two dimensions. Could you help check again, whether it's consistent with your results? Thanks!

Ji4chenLi commented 1 month ago

It resolves my issue! Thank you!