DPS2022 / diffusion-posterior-sampling

Official pytorch repository for "Diffusion Posterior Sampling for General Noisy Inverse Problems"
https://dps2022.github.io/diffusion-posterior-sampling-page/
425 stars 46 forks source link

Calculating FID #5

Open mmuckley opened 1 year ago

mmuckley commented 1 year ago

Hello, thanks for publishing this paper and repo.

I am curious about reproducing the results in the paper. I applied the Gaussian blur model to the first 1,000 images of FFHQ-256 as per Issue #4, but when using torch-fidelity I don't reproduce the FID numbers. If I include torch-fidelity's image resizing, I get 29.3. If I don't include image resizing, I get 37.0. Both of these are pretty far away from the paper value of 44.05.

Could you provide some more details on how to reproduce the numbers of Table 1?

z-fabian commented 1 year ago

This might not apply to your case, but one discrepancy we found is that authors normalize images to (0, 1) range using the given image's min and max before saving, instead of clipping it to (-1, 1) and then normalizing to (0,1) by adding 1 and dividing by 2.

This can be a problem if the reconstructions have some outliers that are not clipped and therefore the range is skewed. In fact, after reading the labels, the above normalization is applied to the labels as well before saving and therefore the loaded and saved labels are not necessarily equal.

neginraoof commented 1 year ago

Hey @mmuckley, I'm trying to reproduce table 1 FID scores, and I'm unable to match FFHQ random inpainting results. I'm wondering if I'm missing some preprocessing steps. Here I'm using the FFHQ256 set https://www.kaggle.com/datasets/xhlulu/flickrfaceshq-dataset-nvidia-resized-256px.

z-fabian commented 1 year ago

FID might also differ based on whether the reconstructions are compared only to the validation set, or to the training and validation set combined. Typically if compared on less samples FID is much worse.