CompVis / taming-transformers

Taming Transformers for High-Resolution Image Synthesis
https://arxiv.org/abs/2012.09841
MIT License
5.7k stars 1.13k forks source link

How to evaluate on ImageNet? #155

Open JohnDreamer opened 2 years ago

JohnDreamer commented 2 years ago

Hi, it's a great work!And thanks for releasing the code! But I have a question --how to evaluate on ImageNet? In other words, should I get the FID scores on the whole ImageNet validation set (totally 50K images) (VQGAN: reconstruction images; Transformer: sampled images)? Should I split the dataset?

rromb commented 2 years ago

Thanks :) Reconstruction metrics should be evaluated on all 50k examples from the validation split. For transformer evaluation (i.e. sample quality), we follow standard practive and first generate 50k new samples and then evaluate against the full training set.

JohnDreamer commented 2 years ago

Thanks :) Reconstruction metrics should be evaluated on all 50k examples from the validation split. For transformer evaluation (i.e. sample quality), we follow standard practive and first generate 50k new samples and then evaluate against the full training set.

Thanks for reply! I still have two questions: (1) how to process the GT images for evaluation? a. only resize each image to 256x256; b. resize the shot edge of the image to 256 and center crop it to 256x256, which do you use? (4) How to sample test images? Do you sample the image of the same number (50) for each class?