artidoro / qlora

QLoRA: Efficient Finetuning of Quantized LLMs
https://arxiv.org/abs/2305.14314
MIT License
9.96k stars 820 forks source link

RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16 #287

Closed andeyeluguo closed 8 months ago

andeyeluguo commented 8 months ago

I just run the code sh scripts/finetune_guanaco_7b.sh and I don't change anything, then error occured torch.float32 266240 7.273663184633547e-05 0%| | 0/1875 [00:00<?, ?it/s]/home/nvidia/anaconda3/envs/qlora/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( Traceback (most recent call last): File "/home/nvidia/test/qlora/qlora.py", line 841, in <module> train() File "/home/nvidia/test/qlora/qlora.py", line 803, in train train_result = trainer.train() File "/home/nvidia/anaconda3/envs/qlora/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train return inner_training_loop( File "/home/nvidia/anaconda3/envs/qlora/lib/python3.10/site-packages/transformers/trainer.py", line 1809, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/nvidia/anaconda3/envs/qlora/lib/python3.10/site-packages/transformers/trainer.py", line 2654, in training_step loss = self.compute_loss(model, inputs) File "/home/nvidia/anaconda3/envs/qlora/lib/python3.10/site-packages/transformers/trainer.py", line 2679, in compute_loss outputs = model(**inputs) File "/home/nvidia/anaconda3/envs/qlora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/nvidia/anaconda3/envs/qlora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/nvidia/anaconda3/envs/qlora/lib/python3.10/site-packages/peft/peft_model.py", line 922, in forward return self.base_model( File "/home/nvidia/anaconda3/envs/qlora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/nvidia/anaconda3/envs/qlora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/nvidia/.local/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/nvidia/anaconda3/envs/qlora/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 824, in forward logits = self.lm_head(hidden_states) File "/home/nvidia/anaconda3/envs/qlora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/nvidia/anaconda3/envs/qlora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/nvidia/anaconda3/envs/qlora/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16 0%| | 0/1875 [00:05<?, ?it/s]

andeyeluguo commented 8 months ago

change bf16 to float16 can solve but My gpu support bf16 that's weid