bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.3k stars 211 forks source link

Does bigscienece's Megatron-DeepSpeed support ZeRO-stage2+cpu offload? #361

Closed drxmy closed 1 year ago

drxmy commented 1 year ago

I find that microsoft's Megatron-DeepSpeed has such feature(https://github.com/microsoft/Megatron-DeepSpeed/pull/56). It is a relatively new PR. I am not sure whether bigscience merge it or not.