Training cost - Githubissues

FoundationVision / LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

https://arxiv.org/abs/2406.06525

MIT License

1.35k stars 55 forks source link

Training cost #8

Open liming-ai opened 5 months ago

liming-ai commented 5 months ago

Thanks for the amazing work, could you open the training cost for each model? such as training GPU times and the least GPU needed.

PeizeSun commented 5 months ago

Hi~ All our experiments use 80G A100 model	params	total bs	lr	epochs	GPUs	training time
tokenizer	72M	128	1e-4	40	8	~2days
LlamaGen-B	111M	256	1e-4	300	8	~1days
LlamaGen-L	343M	256	1e-4	300	8	~2days
LlamaGen-XL	775M	256	2e-4	300	8 x 2	~3days
LlamaGen-XXL	1.4B	512	2e-4	300	8 x 4	~4days
LlamaGen-3B	3.1B	512	2e-4	300	8 x 4	~5days

isidentical commented 5 months ago

do you have numbers for the conditional generation?

GooThinker commented 2 months ago

Why does it take only one day to train LlamaGen-B with 8 A100? Is there a special technique? With the same settings, it takes me 2.5 days to run 300 epochs.