git-cloner / llama2-lora-fine-tuning

llama2 finetuning with deepspeed and lora
https://gitclone.com/aiit/chat/
MIT License
157 stars 14 forks source link

ValueError: Attention mask should be of size (4, 1, 240, 480), but is torch.Size([4, 1, 240, 240]) #12

Open LiBinNLP opened 7 months ago

LiBinNLP commented 7 months ago

I met this issue when fine-tuning the LLaMa-7B-Chat-hf with example dataset:

Traceback (most recent call last): File "finetune-lora.py", line 656, in train() File "finetune-lora.py", line 622, in train train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/transformers/trainer.py", line 1537, in train return inner_training_loop( File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/transformers/trainer.py", line 1854, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/transformers/trainer.py", line 2732, in training_step self.accelerator.backward(loss) File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/accelerate/accelerator.py", line 1905, in backward loss.backward(kwargs) File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/torch/_tensor.py", line 488, in backward torch.autograd.backward( File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/torch/autograd/init.py", line 197, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/torch/autograd/function.py", line 267, in apply return user_fn(self, args) File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 141, in backward outputs = ctx.run_function(detached_inputs) File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = module._old_forward(args, kwargs) File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 789, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward output = module._old_forward(args, **kwargs) File "/sda/libin/anaconda3/envs/llama2/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 423, in forward raise ValueError( ValueError: Attention mask should be of size (4, 1, 240, 480), but is torch.Size([4, 1, 240, 240])

mesdaq commented 7 months ago

解决了吗,一样的问题,是输入的模型问题吗

LiBinNLP commented 7 months ago

解决了吗,一样的问题,是输入的模型问题吗

没解决,用另一个仓库的代码就没问题了:https://github.com/tloen/alpaca-lora

yangjianxin1 commented 6 months ago

遇到相同的问题