LTH14 / rcg

PyTorch implementation of RCG https://arxiv.org/abs/2312.03701
MIT License
785 stars 36 forks source link

Degenerate issue when conditioned by an image #25

Open BoltenWang-Meta opened 7 months ago

BoltenWang-Meta commented 7 months ago

Thx for impressive work and code. So, after checking the inference code under the case conditioned by an image, the whole generation just degenerates into a SSL representation conditioned MAGE. Is my understanding true? If yes, does it mean RDM is not used during inference?

LTH14 commented 7 months ago

Thanks for your interest! For Figure 6 and 7 in the paper (which corresponds to the case conditioned by ground-truth images), the RDM is NOT needed because the SSL representation is provided by ground-truth image. However, we note that such a case has a strong limitation in practice, as typically we don't have ground-truth images when we want to generate an image. Therefore, we need the RDM to generate the SSL representations under common generation scenarios where we don't have ground-truth images

BoltenWang-Meta commented 7 months ago

Got it! I agree with generating images conditioned by GT seems to be useless. I was just wondering where does RDM go in viewing your code. Plus, thx for so quick reply, what a dedicated author you are. hhhh

LTH14 commented 7 months ago

We integrate the RDM sampling process in the pixel generator. For example, you can check it here in the MAGE generator: https://github.com/LTH14/rcg/blob/main/pixel_generator/mage/models_mage.py#L485-L508. I just happened to see this issue pop out -- hope my response can solve your questions.