CompVis / latent-diffusion

High-Resolution Image Synthesis with Latent Diffusion Models
MIT License
11.8k stars 1.53k forks source link

Evaluation Codes on COCO dataset #88

Open canqin001 opened 2 years ago

canqin001 commented 2 years ago

Dear authors,

I noticed that COCO is an essential benchmark for evaluating text-to-image generation. May I ask for the COCO dataset's evaluation code for computing IS and FID?

Thank you so much!

zengxianyu commented 2 years ago

Have you computed FID on coco? I tried evaluating the released model on COCO and got a FID score of 134, which is apparently not correct.

canqin001 commented 2 years ago

I have tried this one (https://github.com/mseitzer/pytorch-fid) to compute FID on COCO. The fid score is around 19 which is still higher than the reported results.

zengxianyu commented 2 years ago

I also used this repo. Did you evaluate on coco validation or training set? How many samples did you use? I was not able to get any score close to reasonable 

------------------ Original ------------------ From: canqin001 @.> Date: Wed, Jun 22, 2022 1:18 PM To: CompVis/latent-diffusion @.> Cc: zengxianyu @.>, Comment @.> Subject: Re: [CompVis/latent-diffusion] Evaluation Codes on COCO dataset(Issue #88)

canqin001 commented 2 years ago

That is a good question. I forgot the gap between train and val set. On val set, the FID is 19.11 and the train set is 12.79 (between val-text-generated images and train-set ground truth images). It seems matching the reported score in the paper. But I am still shocked by such a large gap.

zengxianyu commented 2 years ago

Did you evaluate on the full validation set and training set? The inference is slow so I only ran the model on a small subset

canqin001 commented 2 years ago

Yes. I evaluate the full sets. It takes several hours to go.

CrossLee1 commented 2 years ago

@canqin001 when evaluating, how to process the GT images for evaluation? only resize each image to 256x256; or resize the shot edge of the image to 256 and center crop it to 256x256, which do you use?

yumadara commented 2 years ago

Dear authors,

I noticed that COCO is an essential benchmark for evaluating text-to-image generation. May I ask for the COCO dataset's evaluation code for computing IS and FID?

Thank you so much!

Do you have the ldm pretrained model on coco? I also use the same evaluation code and get a fid larger than 100 on 256 * 256 validation dataset, and I think the reason for that is my ldm is not trained on coco. What about yours?

XingtongGe commented 5 months ago

@canqin001 when evaluating, how to process the GT images for evaluation? only resize each image to 256x256; or resize the shot edge of the image to 256 and center crop it to 256x256, which do you use?

I have the same question, may I ask have you solved it please?