Closed polisettyvarma closed 2 days ago
I responded to this on https://github.com/NVIDIA/Megatron-LM/issues/589.
please convert this issue to feature request for ZeRO 2/3 Thank you.
i think article,https://www.deepspeed.ai/tutorials/megatron/, is useful. deepspeed ZeRO 1/2 works with Megatron-lm latest code.
@carolove Thanks for the inputs, i am familiar with deepspeed framework to enable all ZeRO stages. here query is regarding enabling ZeRO in this repo natively. can you please share commits which added ZeRO 2 support in latest code of this repo. Thank you.
I also look for such example~.
megatron-lm now has its own zero-1 (it is called distributed optimizer in this project), but if u are more familiar with deepspeed, then how bout using deepspeed-megatron, @polisettyvarma ? And to my best knowledge, zero-3 is not compatible with model parallelism (TP or PP) of megatron-lm. zero-3 reduce vram memory and improve throughput by partitioning and broadcasting model parameters but TP or PP partition its own way and rather communicate activations (all-reduce activations for backward and forward). So TP or PP has no room for communicating model parameters.
Thank you @SeunghyunSEO for your inputs. Yes Megatron-DeepSpeed repo can be used but it's not up to date with Megatron-LM. I agree on Zero > 1 is not compatible with PP. My request here is some similar feature of ZeRO on Megatron-LM.
We should have PyTorch FSDP support compatible with TP in the next couple of weeks.
Thank you @deepakn94 for sharing this information.
@polisettyvarma @deepakn94 https://github.com/NVIDIA/Megatron-LM/commit/e1993fa6f70763523a84432ab1f5eb42e77ccf2a#diff-a7ca552e38c01a3a0cacbe37cec383c05743aeaf8143e57fd0901f4139d4a1a9R119 merged into main 2 hours ago
How to enable ZeRO 2/3 stages ? similar to #589