FoundationVision / VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
MIT License
3.78k stars 285 forks source link

what is your position embeding ? is 2d RoPE good choice ? #36

Closed renjingneng closed 2 months ago

renjingneng commented 2 months ago

兄弟试过没有 2d RoPE 会不会进一步 提高模型性能? 我没那么多显卡还没试

Bro, have you tried if using 2D RoPE would further improve model performance? I don't have many GPUs, so I haven't tried it yet.

keyu-tian commented 2 months ago

兄弟试过的;We used randomly initialized, absolute pos embedding in this class-conditional VAR. We found 2d RoPE would be better and used it in our text-conditional VAR (which is still under training).