IBM / Dromedary

Dromedary: towards helpful, ethical and reliable LLMs.
GNU General Public License v3.0
1.11k stars 86 forks source link

time taken to run dromedary training on llama2(7b) llama2(70b) #12

Closed ghost closed 11 months ago

ghost commented 11 months ago

Could you please tell me as to how much time would it take to train llama2 on dromedary with A100 gpus(2).

Edward-Sun commented 11 months ago

Hi,

For the 70b model, the training on 8 GPUs finishes in one day, so I would assume on 2 GPUs, it finishes in 4 days, if you increase the GRAD_ACCUMULATION hyper-parameter here from 4 to 16 to main the same global batch size.

For the 7b, it would be much faster, as you can significantly increase the batch size.