CompVis / latent-diffusion

High-Resolution Image Synthesis with Latent Diffusion Models
MIT License
11.89k stars 1.54k forks source link

FID on coco for text to image generation #90

Open zengxianyu opened 2 years ago

zengxianyu commented 2 years ago

I was trying to evaluate the text to image generation results on coco dataset. I computed the FID on 259 samples randomly drawn from the coco validation dataset and got a FID score of 134, which is significantly higher than the 12.63 reported in the paper. I guess maybe I am doing something wrong in evaluation and hoping someone here can point it out. I just ran the pertained model with the default configuration and computed the FID using this repo https://github.com/mseitzer/pytorch-fid

dillon101001 commented 2 years ago

In my personal experience, I had to use a significant amount of images for the FID to be accurate. Something like 3-10k should be enough.

ttliu-kiwi commented 1 year ago

Have you reproduced the FID results from MSCOCO? I used the 30000 samples mentioned in the paper to evaluate, and the FID is around 40.

lhy101 commented 1 year ago

Have you reproduced the FID results from MSCOCO? I used the 30000 samples mentioned in the paper to evaluate, and the FID is around 40.

Same,I sample 30000 captions in the COCO 2014 val dataset,the FID is 33.

Wujie-nju commented 1 year ago

I was trying to evaluate the text to image generation results on coco dataset. I computed the FID on 259 samples randomly drawn from the coco validation dataset and got a FID score of 134, which is significantly higher than the 12.63 reported in the paper. I guess maybe I am doing something wrong in evaluation and hoping someone here can point it out. I just ran the pertained model with the default configuration and computed the FID using this repo https://github.com/mseitzer/pytorch-fid

I also want to evaluate the text to image generation results. However, I have not found the config file like lusn for it as shown in the following figure. Could you please share your file? Thank you very much. image

jiayisunx commented 1 year ago

@ttliu-kiwi , @lhy101 , can you please share your evalutation test script? thank you very much!

stdKonjac commented 1 year ago

Have you reproduced the FID results from MSCOCO? I used the 30000 samples mentioned in the paper to evaluate, and the FID is around 40.

Same,I sample 30000 captions in the COCO 2014 val dataset,the FID is 33.

Hi, have you solved this issue? I also got similar results :(

lhy101 commented 1 year ago

Have you reproduced the FID results from MSCOCO? I used the 30000 samples mentioned in the paper to evaluate, and the FID is around 40.

Same,I sample 30000 captions in the COCO 2014 val dataset,the FID is 33.

Hi, have you solved this issue? I also got similar results :(

Not yet :(

sir-zengqi commented 1 month ago

@dillon101001 @zengxianyu @ttliu-kiwi @stdKonjac @jiayisunx Have you all solved this problem? What prompt should be used as input for the text to image model? Which real dataset should be used for comparison?