AntNLP / nope_head_scale

MIT License
15 stars 0 forks source link

attention mask不匹配问题 #1

Open RENNY-Jenius opened 5 months ago

RENNY-Jenius commented 5 months ago

在处理好了PG19数据之后,进行训练,一直发现有问题 [WARNING|logging.py:329] 2024-05-14 16:24:22,784 >> LlamaModel is using LlamaSdpaAttention, but torch.nn.functional.scaled_dot_product_attention does not support output_attentions=True. Falling back to the manual attention implementation, but specifying the manual implementation will be required from Transformers version v5.0.0 onwards. This warning can be removed using the argument attn_implementation="eager" when loading the model. Traceback (most recent call last): File "/home/runyu.cai/nope_head_scale/run_clm.py", line 130, in main() File "/home/runyu.cai/nope_head_scale/run_clm.py", line 91, in main train_result: TrainOutput = trainer.train(resume_from_checkpoint=None) File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1537, in train return inner_training_loop( File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1854, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2735, in training_step loss = self.compute_loss(model, inputs) File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2758, in compute_loss outputs = model(inputs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1528, in _call_impl return forward_call(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1523, in forward else self._run_ddp_forward(*inputs, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward return self.module(*inputs, *kwargs) # type: ignore[index] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1528, in _call_impl return forward_call(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1181, in forward outputs = self.model( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1528, in _call_impl return forward_call(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1041, in forward attention_mask = _prepare_4d_causal_attention_mask( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_attn_mask_utils.py", line 306, in _prepare_4d_causal_attention_mask attention_mask = attn_mask_converter.to_4d( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_attn_mask_utils.py", line 136, in to_4d expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min) RuntimeError: The size of tensor a (1024) must match the size of tensor b (750) at non-singleton dimension 3 这个问题我调节了很多参数都没有结果,请问一下能怎么解决这个问题呢?

RmZeta2718 commented 2 months ago

你的transformers版本是什么?本项目使用的版本是4.35,其他版本很可能无法运行。