Open moutasemalakkad opened 4 weeks ago
@moutasemalakkad
Hi, I've just faced same issue.
Here is my solution : insert cfg.trainer.precision = None
above this line.
https://github.com/NVIDIA/NeMo/blob/ebba8b14263ca513c4453fcde0472785c19f46c1/examples/nlp/language_modeling/megatron_gpt_continue_training.py#L167
This solution is inspired from this PR https://github.com/NVIDIA/NeMo/pull/8908/
It should solve the conflict.
Thanks! That also did not work, the work around was to set the plugins to an empty list
trainer = Trainer(plugins=[], strategy=strategy, **cfg.trainer, callbacks=callbacks)
Title: Conflict between precision settings and MegatronHalfPrecisionPlugin in MegatronGPT training
Describe the bug
When attempting to continue training the MegatronGPT model, I encountered a conflict between
precision=bf16-mixed
and theMegatronHalfPrecisionPlugin
. This results in aValueError
indicating that bothprecision=bf16-mixed
and theMegatronHalfPrecisionPlugin
were received and only one should be chosen.Steps/Code to reproduce bug
Configuration:
Error Message:
Expected behavior
The training should proceed without conflicts between precision settings and plugins.
Environment overview (please complete the following information)
Environment details
Additional context