Inquiry about the Fig-6

LTH14 / rcg

PyTorch implementation of RCG https://arxiv.org/abs/2312.03701

MIT License

785 stars 36 forks source link

Inquiry about the Fig-6 #4

Open yuhaoliu7456 opened 8 months ago

yuhaoliu7456 commented 8 months ago

Does anyone know how to generate the visual results in Figure 6? I see that they extract SSL representations from image samples, and the authors don't seem to describe how they combine these features with randomly generated noise in the RDM.

LTH14 commented 8 months ago

Thanks for your interest. For Figure 6 we don't add noise to the extracted representation -- the SSL representation extracted from the pre-trained encoder is directly fed into the pixel generator to generate the images. In the "GT Representation Reconstruction" section of this Jupyter notebook, we provide code for this functionality. If you are interested in how to add random noise during training and unconditional generation, you can check the DDPM and DDIM code here.

yuhaoliu7456 commented 8 months ago

Thanks for your interest. For Figure 6 we don't add noise to the extracted representation -- the SSL representation extracted from the pre-trained encoder is directly fed into the pixel generator to generate the images. In the "GT Representation Reconstruction" section of this Jupyter notebook, we provide code for this functionality. If you are interested in how to add random noise during training and unconditional generation, you can check the DDPM and DDIM code here.

Thanks for your reply.

mapengsen commented 8 months ago

I'm sorry, I don't quite understand what you mean. Did you input GT image into MAGE, and then SSL(GT image) is used as the condition of MAGE, and then do it by changing random seeds constantly？ Thanks a lot. @LTH14

mapengsen commented 8 months ago

Can you tell me a little bit about how Figure 7 is done? Since I see that RCG has only one condition input, how can I interpolate between the two images? Thank you very much for your reply!

LTH14 commented 8 months ago

@mapengsen Thanks for your interest. For Figure 6, we extract representation from GT image and generate image pixels conditioned on this representation. You can refer to the provided visualization notebook for more implementation details. For Figure 7, please refer to this issue #20.

mapengsen commented 8 months ago

Thank you very much! I've understood.

whisper-11 commented 8 months ago

Thank you for your exceptional work! Could you please clarify if the Representation Reconstruction function depicted in Figure 6 also applies to images that are not part of the ImageNet dataset? Thank you very much! @LTH14

LTH14 commented 8 months ago

Thanks for your interest! The provided Moco v3 and MAGE checkpoints are both trained on ImageNet. Therefore, it should give reasonable results on natural images that are not contained in ImageNet. However, if the image is too far away from ImageNet, the reconstruction performance can be bad.

whisper-11 commented 8 months ago

Thanks for your interest! The provided Moco v3 and MAGE checkpoints are both trained on ImageNet. Therefore, it should give reasonable results on natural images that are not contained in ImageNet. However, if the image is too far away from ImageNet, the reconstruction performance can be bad.

Thanks for your reply!