YanzuoLu / CFLD

[CVPR 2024 Highlight] Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis
MIT License
183 stars 12 forks source link

The metrics for VAE reconstructions and the ground truths #17

Closed CHNxindong closed 7 months ago

CHNxindong commented 7 months ago

Hi authors, I want to know: Why do the FID results of CFLD can be better than that of VAE reconstructions and the ground truths? (in Tab. 2) image

YanzuoLu commented 7 months ago

FID is calculated between the training set and the validation set (generated or ground-truth). Lower FID indicates that our generations are closer to the images in the training set. As for why our FID is lower than the ground-truth validation set, this only shows that the FID indicator is not reasonable, since I don't think the generated image can exceed the real image. This is also the case with the results of PIDM. Many publications based on diffusion point out that FID is no longer a robust metric. FID has a heavy relationship with image storage or resolution, which means that it's quite sensitive. So it does't matter, human evaluation/user study is the golden indicator of image generation : )

CHNxindong commented 7 months ago

I understand your point now. Thanks so much for your kind response.