Closed why529913 closed 4 months ago
The setup of the pretrained model I saw was cfg_scale =4, num_inference_steps=50,but test_geodiffusion.py is set to cfg_scale=5,num_inference_steps=100? Does the inconsistency here lead to fid inconsistencies?
@why529913 Indeed, the FID value is an evaluation metric quite sensitive to the implementation. I think there might be two reasons for the differences:
png
or jpg
will make a difference :).cfg_scale=4
is a good choice for the 256x256
model. However, cfg_scale=5
might be a more common default value for all settings to achieve decent performance. After all, I do believe that there are no explicit differences in the generation quality for models with 20-25 FID values......
To reproduce the reported results, you should primarily use the hyper-parameters saved in the generation_config.json
file under each pre-trained checkpoint directory.
The coco-stuff 256x256 pretrained model generates 3097x5 = 15485 images for testing, and then calculates the fid with the val set 3097 images, which is only 24.11, not reaching the accuracy of the paper. I resize all the images to 256x256 and then calculate the fid. Is this the correct procedure?