Closed vkramanuj closed 4 months ago
An update, when using the clean-fid repo (which I trust more than OpenAI evals), I get an rFID score of 2.11
, which is actually lower than the reported number in the paper. I'm curious what exact evaluation commands/scripts you used to produce the numbers in your paper?
Thanks :-)
Hi~ Please first generate val.npy by running val.sh, which contains 50k imagenet validation images. Then replace VIRTUAL_imagenet256_labeled.npz with val.npz
This worked for me, thanks! I have separate problems replicating LlamaGen-B results, but I'll make that a separate post.
Hi! Thanks for the great repo. I've tried reproducing some of your numbers on ImageNet val (256x256) and specifically the rFID isn't matching, both for your checkpoint and for a tokenizer I've trained with your settings.
With your VQ-16 I get:
PSNR: 20.793026, SSIM: 0.675290 (this matches your paper exactly)
Inception Score: 172.32923889160156 FID: 4.284650117003025 sFID: 5.144700494258814 Precision: 0.73054 Recall: 0.6533
(the model-based evals are systematically worse than the results in your paper)
After re-running your training script and performing eval, I get:
PSNR: 20.625670, SSIM: 0.664614
Inception Score: 174.96481323242188 FID: 4.243560592613846 sFID: 5.425596037757714 Precision: 0.72864 Recall: 0.6552
(very similar to your results)
Given that the PSNR/SSIM match exactly, I believe I'm producing the reconstructions and
npz
files correctly. For running the evaluator, my command looks like:Could you advise where I've gone wrong? I'm just using the OpenAI evaluation code provided in this repository. Thanks!