Boese0601 / MagicDance

[ICML 2024] MagicPose(also known as MagicDance): Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion
https://boese0601.github.io/magicdance/
Other
629 stars 52 forks source link

The more details about computing fid-vid. #19

Closed nehcoah closed 4 months ago

nehcoah commented 4 months ago

Thanks for your great work. When I tried to run bash scripts/inference_tiktok_dataset.sh with the checkpoint you provided and compute metrics using exactly same code from DisCo, I noticed that fid-vid do not match the results in the paper while the others are basically the same as those mentioned in the paper. I have got fid-vid with 57.55 as it is reported 46.3 in the paper.

Do you evaluate the metrics between generated images and the corresponding real images for each video, and then average it over all videos? Or just put all 10 videos' generated images all together for evaluation? Or what other possible factors could have led to a different fid-vid?

Boese0601 commented 4 months ago

Hi, thanks for your issue raised here. I remember for image-wise metrics I put all generated images from 10 videos together for evaluation and for video-wise metric I converted the generated images and gt images from each video to mp4 format and average them across all videos.

Please let me know if u have further questions, I'm glad to help here.

nehcoah commented 4 months ago

Thanks for your reply!