NVIDIA / Megatron-LM

Ongoing research training transformer models at scale
https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start
Other
9.23k stars 2.08k forks source link

[BUG] Wrong lr multiplier #882

Open artyomtugaryov opened 1 week ago

artyomtugaryov commented 1 week ago

In the megatron/core/optimizer/__init__.py file the _get_param_groups function overides original lr_mult and setup groups of parameters uncorrectly. Thus, I propose the fix to kip the original value and setup parameters groups correctly.