Closed hudengjunai closed 4 months ago
I have two questions
RuntimeError: Error(s) in loading state_dict for VocabParallelEmbedding:
size mismatch for weight: copying a param with shape torch.Size([75968, 2560]) from checkpoint, the shape in current model is torch.Size([75822, 2560]).
Qwen1.5的Quick Start已经更新,烦请pull下最新的代码重新测试下
training qwe1.5-4b with tp=2 failed with embedding-table load error.
I start train job as follow.
convert model
I convert the qwen1.5-4B model to tp=2 and pp=1,with command
start training job
then i start the train job with
the start job log show
training error