Open canqin001 opened 2 years ago
Have you computed FID on coco? I tried evaluating the released model on COCO and got a FID score of 134, which is apparently not correct.
I have tried this one (https://github.com/mseitzer/pytorch-fid) to compute FID on COCO. The fid score is around 19 which is still higher than the reported results.
I also used this repo. Did you evaluate on coco validation or training set? How many samples did you use? I was not able to get any score close to reasonable
------------------ Original ------------------ From: canqin001 @.> Date: Wed, Jun 22, 2022 1:18 PM To: CompVis/latent-diffusion @.> Cc: zengxianyu @.>, Comment @.> Subject: Re: [CompVis/latent-diffusion] Evaluation Codes on COCO dataset(Issue #88)
That is a good question. I forgot the gap between train and val set. On val set, the FID is 19.11 and the train set is 12.79 (between val-text-generated images and train-set ground truth images). It seems matching the reported score in the paper. But I am still shocked by such a large gap.
Did you evaluate on the full validation set and training set? The inference is slow so I only ran the model on a small subset
Yes. I evaluate the full sets. It takes several hours to go.
@canqin001 when evaluating, how to process the GT images for evaluation? only resize each image to 256x256; or resize the shot edge of the image to 256 and center crop it to 256x256, which do you use?
Dear authors,
I noticed that COCO is an essential benchmark for evaluating text-to-image generation. May I ask for the COCO dataset's evaluation code for computing IS and FID?
Thank you so much!
Do you have the ldm pretrained model on coco? I also use the same evaluation code and get a fid larger than 100 on 256 * 256 validation dataset, and I think the reason for that is my ldm is not trained on coco. What about yours?
@canqin001 when evaluating, how to process the GT images for evaluation? only resize each image to 256x256; or resize the shot edge of the image to 256 and center crop it to 256x256, which do you use?
I have the same question, may I ask have you solved it please?
Dear authors,
I noticed that COCO is an essential benchmark for evaluating text-to-image generation. May I ask for the COCO dataset's evaluation code for computing IS and FID?
Thank you so much!