Open mxjmtxrm opened 1 week ago
@mxjmtxrm , our instructions could be clearer in these docs regarding the compatibility between the converter's --saver
arg and the training model format. There are two model formats, legacy
(a.k.a., 'megatron') and mcore
. In the docs and in your command above, --saver megatron
saves to the legacy format, but during training, the default format is mcore, unless otherwise specified. There are two options for your issue:
--saver mcore
during conversion.--use-legacy-models
to train using the legacy format (rather than the default mcore format).Let me know if you have any questions.
Hi, I tried to finetune Llama2-7b-chat model using megatron. I downloaded the hf checkpoint and convert it to GPT megatron checkpoint referring [https://github.com/NVIDIA/Megatron-LM/blob/fe1640a3cc4866e015bfdb6449f0d1943d2243cb/docs/llama_mistral.md?plain=1#L73](). The command I used is:
Then I tried to train the llama:
I met the following error:
How to solve this problem?