FoundationVision / VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
MIT License
3.78k stars 285 forks source link

the patch_nums of 256*256 image #62

Closed xinding64 closed 1 month ago

xinding64 commented 1 month ago

hi,I want to know, when training a 256x256 size VAR, what are the patch_nums, and why do I see it's the same as 16x16, which is (1, 2, 3, 4, 5, 6, 8, 10, 13, 16)?"