DLLXW / baby-llama2-chinese

用于从头预训练+SFT一个小参数量的中文LLaMa2的仓库;24G单卡即可运行得到一个具备简单中文问答能力的chat-llama2.
MIT License
2.36k stars 290 forks source link

没有SFT的话 推理会抱错,麻烦看看 #30

Open hopeforus opened 10 months ago

hopeforus commented 10 months ago

Traceback (most recent call last): File "/home/hope/work/baby-llama2-chinese/eval_hope.py", line 67, in model.load_state_dict(state_dict, strict=False) File "/home/hope/miniconda3/envs/llama2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for Transformer: size mismatch for tok_embeddings.weight: copying a param with shape torch.Size([64793, 1024]) from checkpoint, the shape in current model is torch.Size([64793, 512]).

DLLXW commented 10 months ago

Traceback (most recent call last): File "/home/hope/work/baby-llama2-chinese/eval_hope.py", line 67, in model.load_state_dict(state_dict, strict=False) File "/home/hope/miniconda3/envs/llama2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for Transformer: size mismatch for tok_embeddings.weight: copying a param with shape torch.Size([64793, 1024]) from checkpoint, the shape in current model is torch.Size([64793, 512]).

这个是你代码里面设置的维度512和保存的模型的维度1024对不上,你把代码里面的dim 改成 1024即可

hopeforus commented 10 months ago

好的 我试一下 多谢啦