KaiChen1998 / GeoDiffusion

Official PyTorch implementation of GeoDiffusion in ICLR 2024 (https://arxiv.org/abs/2306.04607)
https://kaichen1998.github.io/projects/geodiffusion/
MIT License
64 stars 3 forks source link

coco-stuff 256x256 pretrained model recurrent #13

Closed why529913 closed 4 months ago

why529913 commented 5 months ago

The coco-stuff 256x256 pretrained model generates 3097x5 = 15485 images for testing, and then calculates the fid with the val set 3097 images, which is only 24.11, not reaching the accuracy of the paper. I resize all the images to 256x256 and then calculate the fid. Is this the correct procedure?

why529913 commented 4 months ago

The setup of the pretrained model I saw was cfg_scale =4, num_inference_steps=50,but test_geodiffusion.py is set to cfg_scale=5,num_inference_steps=100? Does the inconsistency here lead to fid inconsistencies?

KaiChen1998 commented 4 months ago

@why529913 Indeed, the FID value is an evaluation metric quite sensitive to the implementation. I think there might be two reasons for the differences:

After all, I do believe that there are no explicit differences in the generation quality for models with 20-25 FID values......

KaiChen1998 commented 4 months ago

To reproduce the reported results, you should primarily use the hyper-parameters saved in the generation_config.json file under each pre-trained checkpoint directory.