LTH14 / rcg

PyTorch implementation of RCG https://arxiv.org/abs/2312.03701
MIT License
785 stars 36 forks source link

Inquiry about the selection of pixel generator #21

Closed jiachenlei closed 8 months ago

jiachenlei commented 8 months ago

hi, your work is truly impressive! Thank you for sharing your code! After reading your work, I have a question regarding the selection of the pixel generator, and I would appreciate some guidance.

In the paper, MAGE is chosen as the final pixel generator. Table 6.b shows the ablation study of ADM and LDM conditioned on SSL representation, with ADM and LDM trained for 100 and 40 epochs, respectively, while MAGE+RDM is trained for 800 epochs. According to Table 7, it takes 1.2 days to train an LDM-8 for 150 epochs and 3.9 days to train the MAGE+RDM for 800 epochs. Therefore, training an LDM+RDM for 40 epochs is expected to take ~0.3 days.

a. For a fair comparison, should the LDM+RDM be trained for more epochs? b. If so, what is the performance of LDM+RDM? Does it perform worse, leading you to choose MAGE as the pixel generator?

Please correct me if I have made any mistakes. I appreciate any help in advance!

LTH14 commented 8 months ago

Thanks for your interest! a. We found that with our current implementation (which is directly adapted from the official LDM repo), training LDM for more epochs leads to overfitting problem -- the FID starts to go up after 40 epochs. b. We choose MAGE as the final pixel generator because it shows better much unconditional generation performance on its own.

jiachenlei commented 8 months ago

Got it, thank you for your help!