Open RicardoDominguez opened 9 months ago
Are you using a model from a checkpoint folder or the output folder?
From the output folder
File "<stdin>", line 1, in <module>
File "/lustre/home/rolmedo/axo/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
return model_class.from_pretrained(
File "/lustre/home/rolmedo/axo/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3480, in from_pretrained
) = cls._load_pretrained_model(
File "/lustre/home/rolmedo/axo/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3931, in _load_pretrained_model
raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for MistralForCausalLM:
size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([32002, 4096]).
You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.
I can confirm that I only experience this issue when using Zero3, and Zero 2 works fine.
I can confirm that I only experience this issue when using Zero3, and Zero 2 works fine.
I just ran into the same error, can confirm switching from zero3 to zero2 "solved" the issue.
Using transformers @ git+https://github.com/huggingface/transformers.git@3cefac1d974db5e2825a0cb2b842883a628be7a0
seems to work.
Using
transformers @ git+https://github.com/huggingface/transformers.git@3cefac1d974db5e2825a0cb2b842883a628be7a0
seems to work.
@mgoulao is this a transformers regression then? That particular commit works with zero3 ?
Yes, it does work with ZeRO 3 however you will get this problem: #1035
I had the same error, the transformer library fixes it, but now I get this one.
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 813, in _load_state_dict_into_meta_model set_module_quantized_tensor_to_device(model, param_name, param_device, value=param) File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/bitsandbytes.py", line 128, in set_module_quantized_tensor_to_device new_value = value.to(device) NotImplementedError: Cannot copy out of meta tensor; no data!
I can confirm the same error when finetuning Mistral with chatml format and deepspeed3.
loading model
Traceback (most recent call last):
File "/home/ubuntu/llm_recipes/scripts/push2hub.py", line 33, in <module>
model = AutoModelForCausalLM.from_pretrained(config.model_path, torch_dtype=getattr(torch, config.torch_dtype))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniforge3/envs/pt/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
return model_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniforge3/envs/pt/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3502, in from_pretrained
) = cls._load_pretrained_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniforge3/envs/pt/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3977, in _load_pretrained_model
raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for MistralForCausalLM:
size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([32002, 4096]).
You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.
I can confirm the same error when finetuning Mistral with chatml format and deepspeed3.
loading model Traceback (most recent call last): File "/home/ubuntu/llm_recipes/scripts/push2hub.py", line 33, in <module> model = AutoModelForCausalLM.from_pretrained(config.model_path, torch_dtype=getattr(torch, config.torch_dtype)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/miniforge3/envs/pt/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained return model_class.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/miniforge3/envs/pt/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3502, in from_pretrained ) = cls._load_pretrained_model( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/miniforge3/envs/pt/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3977, in _load_pretrained_model raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}") RuntimeError: Error(s) in loading state_dict for MistralForCausalLM: size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([32002, 4096]). You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.
The post is old, I think there is no solution, you simply cannot use Qlora + DeepSpeed3 Zero. Fortunately, there is now a quite good alternative that has been recently implemented in Axolotl, which involves FSDP (Full Shard + Qlora). Link
The solution I found most viable was to use a non-quantized Lora with DeepSpeed 3.
Apart from that, I believe that as of today, there is no way with DeepSpeed Stage 3 to load Qloras.
I hope I'm wrong, but all the final answers I found on the internet were basically these.
This issue is about full finetune, no lora involved.
I am doing full tine tune, no qlora.
+1 Zero3_bf16 + Full-finetune
RuntimeError: Error(s) in loading state_dict for MistralModel:
size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([32006, 4096]).
You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.
EDIT - Can confirm zero2 works
I encountered this, although mine was with llama 3 + zero3. The model safe tensors were being output as shards, but there was also a model.safetensors
that HF seems to load by default, even though it's not included in the index.json. Once I (re)moved the model.safetensors
file the model seems to have loaded successfully.
Please check that this issue hasn't been reported before.
Expected Behavior
I fine-tune a Mistral model with the default zero3.json and
Training finishes without error. Afterwards, I expect to be able to load the fine-tuned model using
My accelerate config is
Current behaviour
yields the error
and
yields the error
Steps to reproduce
and thereafter
Config yaml
Possible solution
Seems related to #705 and #709
Which Operating Systems are you using?
Python Version
3.10
axolotl branch-commit
main/3e3229e2d99bb509784ac72e6589f8a8e406247f
Acknowledgements