Open liming-ai opened 5 months ago
Hi~ All our experiments use 80G A100 model | params | total bs | lr | epochs | GPUs | training time |
---|---|---|---|---|---|---|
tokenizer | 72M | 128 | 1e-4 | 40 | 8 | ~2days |
LlamaGen-B | 111M | 256 | 1e-4 | 300 | 8 | ~1days |
LlamaGen-L | 343M | 256 | 1e-4 | 300 | 8 | ~2days |
LlamaGen-XL | 775M | 256 | 2e-4 | 300 | 8 x 2 | ~3days |
LlamaGen-XXL | 1.4B | 512 | 2e-4 | 300 | 8 x 4 | ~4days |
LlamaGen-3B | 3.1B | 512 | 2e-4 | 300 | 8 x 4 | ~5days |
do you have numbers for the conditional generation?
Why does it take only one day to train LlamaGen-B with 8 A100? Is there a special technique? With the same settings, it takes me 2.5 days to run 300 epochs.
Thanks for the amazing work, could you open the training cost for each model? such as training GPU times and the least GPU needed.