ZGCTroy / LayoutDiffusion

257 stars 19 forks source link

latent space #24

Open yanxi-design opened 1 month ago

yanxi-design commented 1 month ago

Thank you for your previous response. I am also curious to know, what is the best FID score you achieved while training the COCO 256x256 model in the latent space? If you could answer, that would be great!Thanks!

ZGCTroy commented 1 month ago

The model trained on latent space achieved similar FID compared to image space. Therefore, I believe that training from scratch on the COCO dataset has essentially reached the performance upperbound. The focus should not be on algorithm design but rather on the selection and quality of the training dataset, the use of pretrained models, or model scaling.

ZGCTroy commented 1 month ago

Training an LDM or ADM from scratch on COCO or VG only is completely insufficient and results in poor generalization. This explains why the generation quality is so poor when inputting a never-before-seen layout during testing. I believe this is primarily due to the inadequate quantity and poor quality of the data and annotations.

yanxi-design commented 1 month ago

Thank you for your response. It was very helpful to me.

yanxi-design commented 1 month ago

I modified the guassian_diffusion.py, and then I trained the model in the latent space. The best FID I achieved was 22.17733931330713.That is why I asked you about the FID for COCO 256 in the latent space, as I would like to know if my modifications were effective. Thank you for the help you have given me.