HestiaSky / E4SRec

MIT License
37 stars 7 forks source link

RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::Half != float #9

Open xrj-xrj opened 8 months ago

xrj-xrj commented 8 months ago

Dear authors, Thanks for your nice work! I have already cloned this repo and downloaded the 'Platypus2-7B' model. However, I meet with the following error when running the 'fine-turning.sh': Traceback (most recent call last): File "finetune.py", line 245, in <module> fire.Fire(train) File "/home/anaconda3/envs/platypus/lib/python3.8/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/anaconda3/envs/platypus/lib/python3.8/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/anaconda3/envs/platypus/lib/python3.8/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "finetune.py", line 172, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/home/anaconda3/envs/platypus/lib/python3.8/site-packages/transformers/trainer.py", line 1539, in train return inner_training_loop( File "/home/anaconda3/envs/platypus/lib/python3.8/site-packages/transformers/trainer.py", line 1869, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/anaconda3/envs/platypus/lib/python3.8/site-packages/transformers/trainer.py", line 2777, in training_step self.accelerator.backward(loss) File "/home/anaconda3/envs/platypus/lib/python3.8/site-packages/accelerate/accelerator.py", line 1851, in backward self.scaler.scale(loss).backward(**kwargs) File "/home/anaconda3/envs/platypus/lib/python3.8/site-packages/torch/_tensor.py", line 492, in backward torch.autograd.backward( File "/home/anaconda3/envs/platypus/lib/python3.8/site-packages/torch/autograd/__init__.py", line 251, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "/home/anaconda3/envs/platypus/lib/python3.8/site-packages/torch/autograd/function.py", line 288, in apply return user_fn(self, *args) File "/home/anaconda3/envs/platypus/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 288, in backward torch.autograd.backward(outputs_with_grad, args_with_grad) File "/home/anaconda3/envs/platypus/lib/python3.8/site-packages/torch/autograd/__init__.py", line 251, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "/home/anaconda3/envs/platypus/lib/python3.8/site-packages/torch/autograd/function.py", line 288, in apply return user_fn(self, *args) File "/home/anaconda3/envs/platypus/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 480, in backward grad_A = torch.matmul(grad_output, CB).view(ctx.grad_shape).to(ctx.dtype_A) RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::Half != float

patelrajnath commented 7 months ago

Yes, I get the same error. Did you find any solution?

KpiHang commented 5 months ago

I change gpu,A800 is OK

xrj-xrj commented 5 months ago

I change model.py line33 "load_in_8bit=True" as "load_in_4bit".

Yes, I get the same error. Did you find any solution?