Open mmuckley opened 1 year ago
This might not apply to your case, but one discrepancy we found is that authors normalize images to (0, 1) range using the given image's min and max before saving, instead of clipping it to (-1, 1) and then normalizing to (0,1) by adding 1 and dividing by 2.
This can be a problem if the reconstructions have some outliers that are not clipped and therefore the range is skewed. In fact, after reading the labels, the above normalization is applied to the labels as well before saving and therefore the loaded and saved labels are not necessarily equal.
Hey @mmuckley, I'm trying to reproduce table 1 FID scores, and I'm unable to match FFHQ random inpainting results. I'm wondering if I'm missing some preprocessing steps. Here I'm using the FFHQ256 set https://www.kaggle.com/datasets/xhlulu/flickrfaceshq-dataset-nvidia-resized-256px.
FID might also differ based on whether the reconstructions are compared only to the validation set, or to the training and validation set combined. Typically if compared on less samples FID is much worse.
Hello, thanks for publishing this paper and repo.
I am curious about reproducing the results in the paper. I applied the Gaussian blur model to the first 1,000 images of FFHQ-256 as per Issue #4, but when using
torch-fidelity
I don't reproduce the FID numbers. If I includetorch-fidelity
's image resizing, I get 29.3. If I don't include image resizing, I get 37.0. Both of these are pretty far away from the paper value of 44.05.Could you provide some more details on how to reproduce the numbers of Table 1?