NVIDIA / Megatron-LM

Ongoing research training transformer models at scale
https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start
Other
9.23k stars 2.08k forks source link

[QUESTION] Question about resume with distributed optimizer #851

Open WailordHe opened 4 weeks ago

WailordHe commented 4 weeks ago

Your question Is it possible to load an optimizer that was previously saved using a distributed optimizer configuration, and then continue the training without employing a distributed optimizer?

deepakn94 commented 3 weeks ago

Not without a checkpoint converter.