Facico / Chinese-Vicuna

Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案,结构参考alpaca
https://github.com/Facico/Chinese-Vicuna
Apache License 2.0
4.14k stars 421 forks source link

3070ti训练时报错:cublasLt ran into an error! #41

Closed vegech1cken closed 1 year ago

vegech1cken commented 1 year ago

使用finetune.py脚本训练时报错 命令为: python finetune.py --data_path merge.json --test_size 20 训练环境: 3070ti,8G显存 pytorch 2.0.0+cu117 cuda 11.3

报错: error detectedTraceback (most recent call last): File "finetune.py", line 274, in trainer.train(resume_from_checkpoint=args.resume_from_checkpoint) File "/hy-tmp/transformers/src/transformers/trainer.py", line 1659, in train return inner_training_loop( File "/hy-tmp/transformers/src/transformers/trainer.py", line 1926, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/hy-tmp/transformers/src/transformers/trainer.py", line 2696, in training_step loss = self.compute_loss(model, inputs) File "/hy-tmp/transformers/src/transformers/trainer.py", line 2728, in compute_loss outputs = model(inputs) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/usr/local/lib/python3.8/dist-packages/peft/peft_model.py", line 575, in forward return self.base_model( File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, kwargs) File "/hy-tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 687, in forward outputs = self.model( File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(args, kwargs) File "/hy-tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 569, in forward layer_outputs = torch.utils.checkpoint.checkpoint( File "/usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py", line 249, in checkpoint return CheckpointFunction.apply(function, preserve, args) File "/usr/local/lib/python3.8/dist-packages/torch/autograd/function.py", line 506, in apply return super().apply(args, kwargs) # type: ignore[misc] File "/usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py", line 107, in forward outputs = run_function(args) File "/hy-tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 565, in custom_forward return module(inputs, output_attentions, None) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(args, kwargs) File "/hy-tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 292, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, *kwargs) File "/hy-tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 196, in forward query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/usr/local/lib/python3.8/dist-packages/peft/tuners/lora.py", line 591, in forward result = super().forward(x) File "/usr/local/lib/python3.8/dist-packages/bitsandbytes/nn/modules.py", line 242, in forward out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state) File "/usr/local/lib/python3.8/dist-packages/bitsandbytes/autograd/_functions.py", line 488, in matmul return MatMul8bitLt.apply(A, B, out, bias, state) File "/usr/local/lib/python3.8/dist-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/usr/local/lib/python3.8/dist-packages/bitsandbytes/autograd/_functions.py", line 377, in forward out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB) File "/usr/local/lib/python3.8/dist-packages/bitsandbytes/functional.py", line 1410, in igemmlt raise Exception('cublasLt ran into an error!') Exception: cublasLt ran into an error!

vegech1cken commented 1 year ago

求哪位大佬帮忙解答一下,蟹蟹!

Facico commented 1 year ago

@vegech1cken 这个问题是peft里面的一个bug,可能有以下问题: 1、在多卡环境没有指定使用哪张显卡,它会自动在其他显卡上加载(可以用nvidia-smi看看),问题可以见problems,指定GPU使用CUDA_VISIBLE_DEVICES=xxx 2、显存不够。比如现在显存上已经跑着一个程序,不够的时候会出现这种情况。 3、某张卡存在问题。

sightsIndeep commented 1 year ago

我发觉装了torchvision就好了

zhangyue2709 commented 1 year ago

问题解决了吗,我也遇到了同样的问题

vegech1cken commented 1 year ago
我当时是显存不够的问题,换了大显存的机器就可以了。 XuCheng

@. | 在2023年6月24日 @.> 写道:

问题解决了吗,我也遇到了同样的问题

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>