NVIDIA / Megatron-LM

Ongoing research training transformer models at scale
https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start
Other
9.23k stars 2.08k forks source link

[ENHANCEMENT]How to specify the number of layers in each pipeline stage in my mind? #857

Closed janelu9 closed 3 weeks ago

janelu9 commented 3 weeks ago

Does megatron put same number of transformer decoder layers in each pipeline stage by the following codes? https://github.com/NVIDIA/Megatron-LM/blob/c4d12e26b2dc25a2eab7da92e2ac30338c0ed3de/megatron/core/transformer/transformer_block.py#L31