Open zhaosheng-thu opened 3 weeks ago
I can load the weight using the model.load_state_dict()
, and then everything will go smoothly, but I really want to know why from_pretrained(state_dict=state_dict)
can't work.
Thanks for raising that. Maybe it's a HF thing. I will have to investigate.
I could not reproduce it for another model yet when I gave it a quick try.
I am not sure if it's related because the differences are so big, but I wonder ~what the precision of the tensors in your current state dict are. Could you print the precision of the state dict, and~ could you also try to load it without torch_dtype=torch.float16
?
EDIT: Nevermind, I can see that the precision is bfloat16 in your screenshot.
I tried this also with Llama 3 and it seemed to work fine for me there as well. Here are my steps:
litgpt download --repo_id meta-llama/Meta-Llama-3-8B-Instruct --access_token ...
litgpt finetune \
--checkpoint_dir checkpoints/meta-llama/Meta-Llama-3-8B-Instruct \
--out_dir my_llama_model \
--train.max_steps 1 \
--eval.max_iter 1
litgpt convert from_litgpt \
--checkpoint_dir my_llama_model/final \
--output_dir out/converted_llama_model/
And then in a python session:
and
I fine-tuned llama3-8b with Lora and followed the tutorial in the repository to convert the final result into
model.pth
. However, when I try to load the fine-tuned weights into the model usingAutoModelForCausalLM.from_pretrained
, I am unable to do so correctly. Below is my test:But I found that the
state_dict
oftorch.load
doesn't equal to themodel.state_dict()
, as shown following: torch.load: model.state_dict()I noticed that even though I passed the
state_dict
,from_pretrained
still returns the weights of the model loaded by name. Did I make any mistakes in my code, and how can I solve this? Thanks!