FID Evaluation not matching paper results for VQ-16 checkpoint

vkramanuj commented 5 months ago

Hi! Thanks for the great repo. I've tried reproducing some of your numbers on ImageNet val (256x256) and specifically the rFID isn't matching, both for your checkpoint and for a tokenizer I've trained with your settings.

With your VQ-16 I get:

PSNR: 20.793026, SSIM: 0.675290 (this matches your paper exactly)

Inception Score: 172.32923889160156 FID: 4.284650117003025 sFID: 5.144700494258814 Precision: 0.73054 Recall: 0.6533

(the model-based evals are systematically worse than the results in your paper)

After re-running your training script and performing eval, I get:

PSNR: 20.625670, SSIM: 0.664614

Inception Score: 174.96481323242188 FID: 4.243560592613846 sFID: 5.425596037757714 Precision: 0.72864 Recall: 0.6552

(very similar to your results)

Given that the PSNR/SSIM match exactly, I believe I'm producing the reconstructions and npz files correctly. For running the evaluator, my command looks like:

python evaluator.py ~/assets/VIRTUAL_imagenet256_labeled.npz VQ-16-flatdataset-size-256-size-256-codebook-size-16384-dim-8-seed-0.npz

Could you advise where I've gone wrong? I'm just using the OpenAI evaluation code provided in this repository. Thanks!

vkramanuj commented 5 months ago

An update, when using the clean-fid repo (which I trust more than OpenAI evals), I get an rFID score of 2.11, which is actually lower than the reported number in the paper. I'm curious what exact evaluation commands/scripts you used to produce the numbers in your paper?

Thanks :-)

PeizeSun commented 5 months ago

Hi~ Please first generate val.npy by running val.sh, which contains 50k imagenet validation images. Then replace VIRTUAL_imagenet256_labeled.npz with val.npz

vkramanuj commented 4 months ago

This worked for me, thanks! I have separate problems replicating LlamaGen-B results, but I'll make that a separate post.

FoundationVision / LlamaGen

FID Evaluation not matching paper results for VQ-16 checkpoint #34