Closed passaglia closed 4 months ago
Thanks for the fix, @passaglia! We've already got an internal MR that's been reviewed to fix this issue, so it should be synced to GitHub soon. Thanks again!
Great, thank you @yanring ! I'll close this issue once the GitHub repo is updated.
Describe the bug In the new MoE Token Drop code, the code and documentation expect drop_policy to be either prob or position, but the current default argument is probs.
https://github.com/NVIDIA/Megatron-LM/blob/db3a3f79d1cda60ea4b3db0ceffcf20c5760e11d/megatron/core/transformer/moe/moe_utils.py#L272C5-L272C16
@yanring
Proposed fix https://github.com/NVIDIA/Megatron-LM/pull/811