FoundationVision / LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
https://arxiv.org/abs/2406.06525
MIT License
1.35k stars 55 forks source link

Training cost #8

Open liming-ai opened 5 months ago

liming-ai commented 5 months ago

Thanks for the amazing work, could you open the training cost for each model? such as training GPU times and the least GPU needed.

PeizeSun commented 5 months ago
Hi~ All our experiments use 80G A100 model params total bs lr epochs GPUs training time
tokenizer 72M 128 1e-4 40 8 ~2days
LlamaGen-B 111M 256 1e-4 300 8 ~1days
LlamaGen-L 343M 256 1e-4 300 8 ~2days
LlamaGen-XL 775M 256 2e-4 300 8 x 2 ~3days
LlamaGen-XXL 1.4B 512 2e-4 300 8 x 4 ~4days
LlamaGen-3B 3.1B 512 2e-4 300 8 x 4 ~5days
isidentical commented 5 months ago

do you have numbers for the conditional generation?

GooThinker commented 2 months ago

Why does it take only one day to train LlamaGen-B with 8 A100? Is there a special technique? With the same settings, it takes me 2.5 days to run 300 epochs.