Vchitect / Latte

Latte: Latent Diffusion Transformer for Video Generation.
Apache License 2.0
1.44k stars 147 forks source link

Evaluate the FVD? #65

Closed huangjch526 closed 2 months ago

huangjch526 commented 3 months ago

Hi, can I evaluate the FVD result of 59.82 in your paper on the SKY dataset? Can you release your code and parameters for evaluating FVD?

maxin-cn commented 3 months ago

Hi, can I evaluate the FVD result of 59.82 in your paper on the SKY dataset? Can you release your code and parameters for evaluating FVD?

Please refer to https://github.com/universome/stylegan-v.

huangjch526 commented 2 months ago

I used this code and the video generated by your pre-trained model to test the FVD, but I was only able to measure an FVD of 96.5, and could not reproduce the values of your paper. Can anyone teach me how to evaluate the FVD result of 59.82 in the paper on the SKY dataset?

maxin-cn commented 2 months ago

I used this code and the video generated by your pre-trained model to test the FVD, but I was only able to measure an FVD of 96.5, and could not reproduce the values of your paper. Can anyone teach me how to evaluate the FVD result of 59.82 in the paper on the SKY dataset?

Could you provide some more details about your evaluation of FVD?

huangjch526 commented 2 months ago

I generated 2048 16-frame videos with sample.sh using the provided pre-trained Latte checkpoint on sky, and then I converted the generated videos to frame-by-frame image format, followed by the stylegan-v/src/scripts/calc_metrics_for_dataset.py code in stylegan-v , I set the real video dir to be the image format of the sky dataset and set the fake video dir to be the image format of the video I generated. In calc_metrics_for_dataset.py I use the recommended default parameters from its readme. I choose to measure the metrics fvd2048_16f.

maxin-cn commented 2 months ago

I generated 2048 16-frame videos with sample.sh using the provided pre-trained Latte checkpoint on sky, and then I converted the generated videos to frame-by-frame image format, followed by the stylegan-v/src/scripts/calc_metrics_for_dataset.py code in stylegan-v , I set the real video dir to be the image format of the sky dataset and set the fake video dir to be the image format of the video I generated. In calc_metrics_for_dataset.py I use the recommended default parameters from its readme. I choose to measure the metrics fvd2048_16f.

Maybe you can adjust this parameter to match the training sample interval.