Experimental settings for computing FID-20K of EG3D

JeffreyXiang / GRAM-HD

PyTorch implementation of the ICCV paper "GRAM-HD: 3D-Consistent Image Generation at High Resolution with Generative Radiance Manifolds"

https://jeffreyxiang.github.io/GRAM-HD/

MIT License

33 stars 2 forks source link

Experimental settings for computing FID-20K of EG3D #1

Open hse1032 opened 9 months ago

hse1032 commented 9 months ago

Hello. First of all, thank you for sharing your valuable codebase!

I have some question about the experimental results in Table. 1 of your paper.

I want to know the experimental settings to compute the numbers of Table. 1. FID-20K of EG3D is 8.72, and I wonder that what dataset you use for computing it.

I guess that you may use the official weight of EG3D and compute FID-20K against real images obtained by preprocessing of GRAM (https://github.com/microsoft/GRAM).

I hope this question does not bother you too much.

Thanks,

JeffreyXiang commented 9 months ago

Thanks for your interest in our project!

We consistently use the same evaluation module for all our numerical results (see eval.py). For real samples, we use the official data preprocess script for each compared method, if using the provided pretrained checkpoint. I am pretty sure that the metrics of EG3D is calculated using the official data preprocess script of it, which re-crops the original photos of FFHQ. Note that the FID-20K number is calculated between 20k real and 20k fake samples.

hse1032 commented 9 months ago

Thank you for your prompt reply!

Sorry for my confusion. I misunderstood that FID-20K is obtained by comparing 20K fake images with full images of real dataset, as done in the FID implementation of EG3D.

I will try to reproduce the number reported in your paper. Thanks again,

hse1032 commented 9 months ago

Hi, sorry for the inconvenience.

I have a few more questions about the evaluation protocol.

EG3D seems to sample camera poses from the original distribution (e.g. camera poses from FFHQ images). Differently, GRAM_HD randomly samples the camera pose from pre-defined distribution. Did you evaluate the EG3D as their original protocol, or randomly sample the pose from predefined distribution?
In eval.py, the default parameters of the number of images and image size are 10K and 128. FID-20K uses 20K images, so I assume that the number of images should be 20K, but for image size, what should I use (e.g. 128? or 256?)

Thanks,

JeffreyXiang commented 8 months ago

We sample from the prior distribution (0.3, 0.15) for yaw and pitch angle, respectively.
The number of images for evaluation is 20k and image resolution is set to its original resolution. (e.g., eg3d uses 512)