在处理好了PG19数据之后,进行训练,一直发现有问题
[WARNING|logging.py:329] 2024-05-14 16:24:22,784 >> LlamaModel is using LlamaSdpaAttention, but torch.nn.functional.scaled_dot_product_attention does not support output_attentions=True. Falling back to the manual attention implementation, but specifying the manual implementation will be required from Transformers version v5.0.0 onwards. This warning can be removed using the argument attn_implementation="eager" when loading the model.
Traceback (most recent call last):
File "/home/runyu.cai/nope_head_scale/run_clm.py", line 130, in
main()
File "/home/runyu.cai/nope_head_scale/run_clm.py", line 91, in main
train_result: TrainOutput = trainer.train(resume_from_checkpoint=None)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1537, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1854, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2735, in training_step
loss = self.compute_loss(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2758, in compute_loss
outputs = model(inputs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1528, in _call_impl
return forward_call(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1523, in forward
else self._run_ddp_forward(*inputs, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
return self.module(*inputs, *kwargs) # type: ignore[index]
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1528, in _call_impl
return forward_call(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1181, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1528, in _call_impl
return forward_call(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1041, in forward
attention_mask = _prepare_4d_causal_attention_mask(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_attn_mask_utils.py", line 306, in _prepare_4d_causal_attention_mask
attention_mask = attn_mask_converter.to_4d(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_attn_mask_utils.py", line 136, in to_4d
expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min)
RuntimeError: The size of tensor a (1024) must match the size of tensor b (750) at non-singleton dimension 3
这个问题我调节了很多参数都没有结果,请问一下能怎么解决这个问题呢?
在处理好了PG19数据之后,进行训练,一直发现有问题 [WARNING|logging.py:329] 2024-05-14 16:24:22,784 >> LlamaModel is using LlamaSdpaAttention, but
main()
File "/home/runyu.cai/nope_head_scale/run_clm.py", line 91, in main
train_result: TrainOutput = trainer.train(resume_from_checkpoint=None)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1537, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1854, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2735, in training_step
loss = self.compute_loss(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2758, in compute_loss
outputs = model(inputs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1528, in _call_impl
return forward_call(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1523, in forward
else self._run_ddp_forward(*inputs, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
return self.module(*inputs, *kwargs) # type: ignore[index]
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1528, in _call_impl
return forward_call(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1181, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1528, in _call_impl
return forward_call(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1041, in forward
attention_mask = _prepare_4d_causal_attention_mask(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_attn_mask_utils.py", line 306, in _prepare_4d_causal_attention_mask
attention_mask = attn_mask_converter.to_4d(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_attn_mask_utils.py", line 136, in to_4d
expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min)
RuntimeError: The size of tensor a (1024) must match the size of tensor b (750) at non-singleton dimension 3
这个问题我调节了很多参数都没有结果,请问一下能怎么解决这个问题呢?
torch.nn.functional.scaled_dot_product_attention
does not supportoutput_attentions=True
. Falling back to the manual attention implementation, but specifying the manual implementation will be required from Transformers version v5.0.0 onwards. This warning can be removed using the argumentattn_implementation="eager"
when loading the model. Traceback (most recent call last): File "/home/runyu.cai/nope_head_scale/run_clm.py", line 130, in