Closed ltm920716 closed 4 weeks ago
by the way, I have converted the llama3-8b hf to megatron, the converted model layers are
My understanding is that megatron
model type is deprecated. Consider using mcore
model type and --use-mcore-models
when doing training.
My understanding is that
megatron
model type is deprecated. Consider usingmcore
model type and--use-mcore-models
when doing training.
hi thanks, --use-mcore-models is useful,but something like ffn-gate and so on is still matter,I will go to nemo-framework-lancher and see the difference
Hi, I have run the gpt2 demo successfully by
sh examples/pretrain_gpt.sh
, and I want to build the llama3-8b model through Megatron-LM. So I change the params inexamples/pretrain_gpt.sh
like bellow:I also add code snippet in 'pretrain_gpt.py' to show the model layers as follow:
the output result is
I think the qkv part is not correct, right?
the params:
have no effect, please help, thanks!
by the way, I have converted the llama3-8b hf to megatron, the converted model layers are: