[QUESTION] About Optimizer & Params Offload

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start

Other

10.13k stars 2.28k forks source link

[QUESTION] About Optimizer & Params Offload #946

Closed shh2000 closed 2 months ago

shh2000 commented 2 months ago

Hello, I wonder if there's any plan to support offload optimizer states even params like ZeRO-offload? DeepSpeed has offload but without mcore's parallel or TE. I hope there's any possibility to use offload with TP/PP/CP and TE, achieving high performance, especially when h2d/d2h bandwidth is higher like those mentioned in https://openreview.net/pdf?id=rqn2v1Ltgn0.

Looking forward to your reply and thanks!

deepakn94 commented 2 months ago

CPU offloading is not on our roadmap right now.