NVIDIA / Megatron-LM

Ongoing research training transformer models at scale
https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start
Other
10.13k stars 2.28k forks source link

[BUG] [MoE] Typo in Token Drop policy's default value #812

Closed passaglia closed 4 months ago

passaglia commented 4 months ago

Describe the bug In the new MoE Token Drop code, the code and documentation expect drop_policy to be either prob or position, but the current default argument is probs.

https://github.com/NVIDIA/Megatron-LM/blob/db3a3f79d1cda60ea4b3db0ceffcf20c5760e11d/megatron/core/transformer/moe/moe_utils.py#L272C5-L272C16

@yanring

Proposed fix https://github.com/NVIDIA/Megatron-LM/pull/811

yanring commented 4 months ago

Thanks for the fix, @passaglia! We've already got an internal MR that's been reviewed to fix this issue, so it should be synced to GitHub soon. Thanks again!

passaglia commented 4 months ago

Great, thank you @yanring ! I'll close this issue once the GitHub repo is updated.

passaglia commented 4 months ago

Fixed in https://github.com/NVIDIA/Megatron-LM/commit/7968fd65326594d649f8a10de10f21188d3e294c