Closed vegech1cken closed 1 year ago
求哪位大佬帮忙解答一下,蟹蟹!
@vegech1cken 这个问题是peft里面的一个bug,可能有以下问题: 1、在多卡环境没有指定使用哪张显卡,它会自动在其他显卡上加载(可以用nvidia-smi看看),问题可以见problems,指定GPU使用CUDA_VISIBLE_DEVICES=xxx 2、显存不够。比如现在显存上已经跑着一个程序,不够的时候会出现这种情况。 3、某张卡存在问题。
我发觉装了torchvision就好了
问题解决了吗,我也遇到了同样的问题
我当时是显存不够的问题,换了大显存的机器就可以了。 | XuCheng | |
---|---|---|
@. | 在2023年6月24日 @.> 写道:
问题解决了吗,我也遇到了同样的问题
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
使用finetune.py脚本训练时报错 命令为: python finetune.py --data_path merge.json --test_size 20 训练环境: 3070ti,8G显存 pytorch 2.0.0+cu117 cuda 11.3
报错: error detectedTraceback (most recent call last): File "finetune.py", line 274, in
trainer.train(resume_from_checkpoint=args.resume_from_checkpoint)
File "/hy-tmp/transformers/src/transformers/trainer.py", line 1659, in train
return inner_training_loop(
File "/hy-tmp/transformers/src/transformers/trainer.py", line 1926, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/hy-tmp/transformers/src/transformers/trainer.py", line 2696, in training_step
loss = self.compute_loss(model, inputs)
File "/hy-tmp/transformers/src/transformers/trainer.py", line 2728, in compute_loss
outputs = model(inputs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
File "/usr/local/lib/python3.8/dist-packages/peft/peft_model.py", line 575, in forward
return self.base_model(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, kwargs)
File "/hy-tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 687, in forward
outputs = self.model(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(args, kwargs)
File "/hy-tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 569, in forward
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, args)
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/function.py", line 506, in apply
return super().apply(args, kwargs) # type: ignore[misc]
File "/usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(args)
File "/hy-tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 565, in custom_forward
return module(inputs, output_attentions, None)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(args, kwargs)
File "/hy-tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 292, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, kwargs)
File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, *kwargs)
File "/hy-tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 196, in forward
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "/usr/local/lib/python3.8/dist-packages/peft/tuners/lora.py", line 591, in forward
result = super().forward(x)
File "/usr/local/lib/python3.8/dist-packages/bitsandbytes/nn/modules.py", line 242, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/usr/local/lib/python3.8/dist-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/usr/local/lib/python3.8/dist-packages/bitsandbytes/autograd/_functions.py", line 377, in forward
out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
File "/usr/local/lib/python3.8/dist-packages/bitsandbytes/functional.py", line 1410, in igemmlt
raise Exception('cublasLt ran into an error!')
Exception: cublasLt ran into an error!