LTH14 / rcg

PyTorch implementation of RCG https://arxiv.org/abs/2312.03701
MIT License
785 stars 36 forks source link

some question about pixel generator #1

Open miganchuanbo opened 8 months ago

miganchuanbo commented 8 months ago

Thanks for the excellent work. I am a bit confusing about the Fig 3(b). In the fig, the original image and the representation are sent to the pixel generator. I am just wondering if it is ok to exclude the original image (just the representation).

LTH14 commented 8 months ago

Thanks for your interest. Please note that Fig 3(b) is to illustrate the pixel generator's training phase. Most current generative frameworks, such as MAGE and LDM, either partially mask or add noise to the original image, and ask the model to reconstruct the original image during training. In FIg 3(b), we take MAGE as an example, which first tokenizes the image into image tokens and then masks some of the tokens. Therefore, the original image is needed as the input of the training phase. However, we do not need the original image during generation -- generation starts from a 100% masked image (MAGE), or Gaussian noise (LDM/ADM), conditioned on only the representation.