Training on 8 Nvidia RTX A6000

OpenLMLab / MOSS-RLHF

MOSS-RLHF

Apache License 2.0

1.3k stars 101 forks source link

Training on 8 Nvidia RTX A6000 #19

Open Top34051 opened 1 year ago

Top34051 commented 1 year ago

Hi Authors, thank you so much for your huge contribution!! I'm pretty new to the optimization workarounds for training large models, so I'm struggling to get the training for Llama-7B started on my setup (8 Nvidia RTX A6000s each having 48 GB of GPU memory). What would you recommend changing the optimization config to get the training working in this case? Thank you so much!

Ablustrund commented 1 year ago

Thank you very much for your interest in this project, and I apologize for the delayed reply.

We set zero3 and offload the parameters to the CPU, the bsz is set to 2, and we cost around 54G.