I learned from the README HLLM are using TinyLlama. I'm also using the pre-trained base model of TinyLlama, but I encountered an error when running it. Can you help me take a look?
Specifically, when running the model under this path https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T, it prompts a size mismatch error: "size mismatch for model.layers.19.self_attn.k_proj.weight: copying a param with shape torch.Size([256, 2048]) from checkpoint, the shape in current model is torch.Size([2048, 2048])". The code I'm running is as follows
from transformers import AutoTokenizer, AutoModelForCausalLM, LlamaForCausalLM
from transformers import AutoConfig
path='TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T'
model = LlamaForCausalLM.from_pretrained(path)
I've traced the issue and suspect that the author might have modified the transformer code. Have you encountered this problem before?
Could you please check your version of transformers? Some Llama models use GQA in attention module. You can update your transformers to 4.41.1 to check if you can run the script.
HI.
I learned from the README
HLLM
are using TinyLlama. I'm also using the pre-trained base model of TinyLlama, but I encountered an error when running it. Can you help me take a look?Specifically, when running the model under this path https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T, it prompts
a size mismatch error: "size mismatch for model.layers.19.self_attn.k_proj.weight: copying a param with shape torch.Size([256, 2048]) from checkpoint, the shape in current model is torch.Size([2048, 2048])".
The code I'm running is as followsI've traced the issue and suspect that the author might have modified the transformer code. Have you encountered this problem before?