coco-stuff 256x256 pretrained model recurrent

why529913 commented 5 months ago

The coco-stuff 256x256 pretrained model generates 3097x5 = 15485 images for testing, and then calculates the fid with the val set 3097 images, which is only 24.11, not reaching the accuracy of the paper. I resize all the images to 256x256 and then calculate the fid. Is this the correct procedure?

why529913 commented 4 months ago

The setup of the pretrained model I saw was cfg_scale =4, num_inference_steps=50，but test_geodiffusion.py is set to cfg_scale=5，num_inference_steps=100? Does the inconsistency here lead to fid inconsistencies?

KaiChen1998 commented 4 months ago

@why529913 Indeed, the FID value is an evaluation metric quite sensitive to the implementation. I think there might be two reasons for the differences:

FID code. We use the evaluation code and data preprocessing code from LAMA to run the experiments. Please follow every single step for re-production. Empirically, even saving the images with png or jpg will make a difference :).
Inference hyper-parameters: as you claimed, we empirically find cfg_scale=4 is a good choice for the 256x256 model. However, cfg_scale=5 might be a more common default value for all settings to achieve decent performance.

After all, I do believe that there are no explicit differences in the generation quality for models with 20-25 FID values......

KaiChen1998 commented 4 months ago

To reproduce the reported results, you should primarily use the hyper-parameters saved in the generation_config.json file under each pre-trained checkpoint directory.

KaiChen1998 / GeoDiffusion

coco-stuff 256x256 pretrained model recurrent #13