FoundationVision / VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
MIT License
3.8k stars 285 forks source link

Resource consumption #10

Closed xiaosu-zhu closed 2 months ago

xiaosu-zhu commented 2 months ago

Great work! I am wondering how much GPU time does this work require for training across different models? I can't find descriptions in paper.

keyu-tian commented 2 months ago

Thank you @xiaosu-zhu. Training VAR-d16 for 200 epochs on ImageNet 256x256 costs 2.5 days on 16 A100s. Training VAR-d30 for 350 epochs on ImageNet 512x512 with progressive training requires 256 A100 for around 4 days.

We'll add these in the new version of paper.

xiaosu-zhu commented 2 months ago

Thanks for your reply. 👍