FoundationVision / VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
MIT License
4.03k stars 303 forks source link

the size of latent space #65

Open xinding64 opened 4 months ago

xinding64 commented 4 months ago

In fact, in inference, a 256256 image is generated within a 1616 latent space, and then a 256*256 real image is generated through the decoder, right?"

keyu-tian commented 4 months ago

Yes