baaivision / Emu

Emu Series: Generative Multimodal Models from BAAI
https://baaivision.github.io/emu2/
Apache License 2.0
1.66k stars 86 forks source link

question about the visual autoencoder #55

Open Junction4Nako opened 11 months ago

Junction4Nako commented 11 months ago

Thanks for the great work! I have some questions about the checkpoints:

  1. It seems that BAAI/Emu2 does not include the weight of visual decoder (diffusion unet), but I think in section 2.2.3 of the paper, Emu2 should include the autoencoder-trained decoder?
  2. Emu2-Gen provides the weights of visual decoder, can the visual encoder and decoder in BAAI/Emu2-Gen work as an autoencoder? looking forward to your reply~
ryanzhangfan commented 11 months ago

Thanks for your interest in our work!

  1. The visual decoder of Emu2. As stated in the paper, we freeze the visual encoder during the training of Emu2-Gen and visual decoder. Hence, Emu2 and Emu2-Gen share exactly the same visual decoder. The visual decoder weights in Emu2-Gen can be directly used in Emu2.

  2. The autoencoder paradigm Yes, the visual encoder and the visual decoder can work as an autoencoder. Our pipeline currently supports to generate the output in an autoenoding manner. You can find instructions at HF version model or native PyTorch version model(at the bottom part of the example codes).