which TinyLlama base model do HLLM use?

mathetian commented 1 month ago

HI.

I learned from the README HLLM are using TinyLlama. I'm also using the pre-trained base model of TinyLlama, but I encountered an error when running it. Can you help me take a look?

Specifically, when running the model under this path https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T, it prompts a size mismatch error: "size mismatch for model.layers.19.self_attn.k_proj.weight: copying a param with shape torch.Size([256, 2048]) from checkpoint, the shape in current model is torch.Size([2048, 2048])". The code I'm running is as follows

from transformers import AutoTokenizer, AutoModelForCausalLM, LlamaForCausalLM
from transformers import AutoConfig
path='TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T'
model = LlamaForCausalLM.from_pretrained(path)

I've traced the issue and suspect that the author might have modified the transformer code. Have you encountered this problem before?

ssyzeChen commented 1 month ago

Could you please check your version of transformers? Some Llama models use GQA in attention module. You can update your transformers to 4.41.1 to check if you can run the script.

mathetian commented 1 month ago

resolved

bytedance / HLLM

which TinyLlama base model do HLLM use? #2