Open mcao516 opened 2 years ago
I'm experiencing this too. Not sure what I'm doing wrong. Downloaded the weights from here which the "fixed" link from #646. However, I also downloaded the slim weights and that seems to load ok, although the output from the model is gibberish.
I am getting the same problem too when trying to train a 1-3B model.
To Reproduce:
./configs/1-3B.yml
as shown in the screen shots below.python ./deepy.py train.py -d configs 1-3B.yml
Screenshots:
Environment:
I also had the same problem, when using a single machine to load the slim weight downloaded on github, it reported a similar error, here is a screenshot of the error message
Environment:
GPU's: 4x 3090 (96G)
What's the solution ? and why closed ?
@djaym7 Thanks for saying something. I don't recall closing this and have reopened it.
@FayZ676 the url you’re linking to does not contain the weights for a 1.3B model, it contains the weights for a 20B model. That’s why you’re getting a size mismatch: it’s quite simply the wrong size. I suspect that this is unrelated to the problems the others are having.
@leclem so that change allows you to finetune the 20B model? Can you post a WandB link showing it training so I can check out the loss etc are as expected?
I have the same issue trying to train. Downloaded slim weight and using ./config/20B.yml and running "python3 ./deepy.py train.py ./configs/20B.yml" gives this error:
RuntimeError: Error(s) in loading state_dict for EmbeddingPipe: size mismatch for word_embeddings.weight: copying a param with shape torch.Size([12608, 6144]) from checkpoint, the shape in current model is torch.Size([12672, 6144]).
I suspect that this is an error that has to do with model parallelism. @shaunstoltz how many GPUs were you loading the model onto / what was the model parallelism setting?
Describe the bug RuntimeError: Error(s) in loading state_dict for EmbeddingPipe: size mismatch for word_embeddings.weight: copying a param with shape torch.Size([25216, 6144]) from checkpoint, the shape in current model is torch.Size([50304, 6144]).
To Reproduce
./configs/20B.yml
(HFTokenizer is used)./deepy.py generate.py ./configs/20B.yml -i prompt.txt -o sample_outputs.txt
Screenshots![image](https://user-images.githubusercontent.com/24154312/177880742-7015d300-deab-43e5-a5f2-ef4c645b4254.png)
Environment (please complete the following information):