NVIDIA / Megatron-LM

Ongoing research training transformer models at scale
https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start
Other
10.13k stars 2.28k forks source link

[Bugfix] [MoE] Fix typo in token drop policy's default value #811

Closed passaglia closed 4 months ago

passaglia commented 4 months ago

In the new MoE Token Drop code, the code and documentation expect drop_policy to be either prob or position, but the current default argument is probs. This fixes that typo.

@yanring

passaglia commented 4 months ago

Close due to internal MR solving this issue: https://github.com/NVIDIA/Megatron-LM/issues/812#issuecomment-2097991119