[QUESTION] why the _p2p_ops functions has the condition branches for get_pipeline_model_parallel_rank()

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start

Other

9.23k stars 2.08k forks source link

[QUESTION] why the _p2p_ops functions has the condition branches for get_pipeline_model_parallel_rank() #865

Open lichenlu opened 2 weeks ago

lichenlu commented 2 weeks ago

Your question Ask a clear and concise question about Megatron-LM. why the _p2p_ops func has the condition branches to distinguish between get_pipeline_model_parallel_rank() % 2 == 0 and get_pipeline_model_parallel_rank() % 2 != 0 ？ just for different send recv launch order? but send next and recv prev will use different stream, no dependency