Closed renweizhukov closed 3 weeks ago
Hi, try to follow up here: I think we have few different version of NeMo checkpoints at the moment. The current script works for more recent nemo checkpoints but maybe not 2B.
The error is not caused by the input NeMo checkpoint. It is caused by this change in the output mcore checkpoint, i.e., these two sets of weight and bias have been renamed according to the module_name_rewrite_list given in https://github.com/NVIDIA/Megatron-LM/blob/e33c8f78a35765d5aa37475a144da60e8a2349d1/megatron/core/inference/gpt/state_dict_hooks.py#L116-L119
@yaoyu-33 Please let me know if you need more info about this issue or the pull request. Thanks!
Hi sorry for the delay. Yes, it makes sense now. added a comment in your PR.
Can you sign your PR when you commit by commit -sm "commit msg"
@yaoyu-33 No problem. Thank you for the response! I addressed your comment in my PR. I signed my PR and rebased it onto the latest main.
@yaoyu-33 Gentle ping. Please let me know if you have any other comment. Thanks!
hi @renweizhukov we should probably change others then. We try to keep the core args the same (using _) across the checkpoint_converters folder.
After this change, I think it's good to go
@yaoyu-33 Make sense. I have changed the hyphens to underscores for the two other command-line options.
@yaoyu-33 I have made the change per your suggestion. Could you please take a look? Thanks!
@yaoyu-33 I have made the change per your suggestion. Could you please take a look? Thanks!
will get it merged today
@yaoyu-33 Just wonder if we have merged the pull request.
there were some ci issues last week, finally merged.
Great. Thank you!
Describe the bug
RuntimeError "Unexpected key" when running checkpoint_converters script convert_got_nemo_to_mcore.py
Steps/Code to reproduce bug
Follow the instructions given in https://docs.nvidia.com/nemo-framework/user-guide/latest/modelalignment/dpo.html to convert a GPT-2B checkpoint to Megatron-Core checkpoint.
RuntimeError
.Expected behavior
The script should write the converted checkpoint under the given
output_path
.Environment overview (please complete the following information)
Environment location: Docker
Method of NeMo install: from source
If method of install is [Docker], provide
docker pull
&docker run
commands usedEnvironment details
If NVIDIA docker image is used you don't need to specify these. Otherwise, please provide:
Additional context
Add any other context about the problem here.
GPU model: NVIDIA A100-SXM4-80GB