Open two-tiger opened 1 year ago
Hello, I had the same problem.
I encountered this too, has anyone identified a solution?
For me, the issue is that I use another model.cuda()
after load the model (which is already on cuda). Simply comment out model.cuda()
solves the problem.
Here are a list of packages version in my environment:
pytorch
: 1.14.0a0+410ce96
transformers
: 4.30.1
accelerate
: 0.20.3
peft
: 0.4.0.dev0
Hope this might be useful :)
Same issue for me:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 113
CUDA SETUP: Loading binary /lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
mode='disabled'
run=
report_to='none'
{'report_to': 'none', 'path2config': '/lfs/hyperturing1/0/brando9/ultimate-utils/ultimate-utils-proj-src/uutils/wandb_uu/sweep_configs/debug_config.yaml', 'program': '~/ultimate-utils/ultimate-utils-proj-src/uutils/wandb_uu/sweeps_common.py', 'project': 'playground', 'entity': 'brando', 'name': 'debug-logging-to-wandb-plataform-test', 'description': 'debug-not-logging-to-wandb-plataform-test', 'metric': {'name': 'train_loss', 'goal': 'minimize'}, 'method': 'random', 'optimizer': 'nadam', 'scheduler': 'cosine', 'lr': 0.0001, 'batch_size': 32, 'num_its': 2, 'run_cap': 1}
Found cached dataset json (/lfs/hyperturing1/0/brando9/.cache/huggingface/datasets/timdettmers___json/timdettmers--openassistant-guanaco-6126c710748182cf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
Found cached dataset json (/lfs/hyperturing1/0/brando9/.cache/huggingface/datasets/timdettmers___json/timdettmers--openassistant-guanaco-6126c710748182cf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:12<00:00, 1.61s/it]
Loading cached processed dataset at /lfs/hyperturing1/0/brando9/.cache/huggingface/datasets/timdettmers___json/timdettmers--openassistant-guanaco-6126c710748182cf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-889fee109929377a.arrow
0%| | 0/500 [00:00<?, ?it/s]You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Traceback (most recent call last):
File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/pdb.py", line 1723, in main
pdb._runscript(mainpyfile)
File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/pdb.py", line 1583, in _runscript
self.run(statement)
File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/bdb.py", line 598, in run
exec(cmd, globals, locals)
File "<string>", line 1, in <module>
File "/afs/cs.stanford.edu/u/brando9/ultimate-utils/ultimate-utils-proj-src/uutils/hf_uu/mains_hf/falcon_uu/main_falcon_uu.py", line 34, in <module>
main_falcon()
File "/afs/cs.stanford.edu/u/brando9/ultimate-utils/ultimate-utils-proj-src/uutils/hf_uu/mains_hf/falcon_uu/main_falcon_uu.py", line 21, in main_falcon
train(args)
File "/afs/cs.stanford.edu/u/brando9/ultimate-utils/ultimate-utils-proj-src/uutils/hf_uu/train/sft/qlora_ft.py", line 58, in train_falcon
trainer.train()
File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/transformers/trainer.py", line 1645, in train
return inner_training_loop(
File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/transformers/trainer.py", line 1938, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/transformers/trainer.py", line 2759, in training_step
loss = self.compute_loss(model, inputs)
File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/transformers/trainer.py", line 2784, in compute_loss
outputs = model(**inputs)
File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/accelerate/utils/operations.py", line 553, in forward
return model_forward(*args, **kwargs)
File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/accelerate/utils/operations.py", line 541, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/peft/peft_model.py", line 678, in forward
return self.base_model(
File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/lfs/hyperturing1/0/brando9/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-7b/2f5c3cd4eace6be6c0f12981f377fb35e5bf6ee5/modelling_RW.py", line 753, in forward
transformer_outputs = self.transformer(
File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/lfs/hyperturing1/0/brando9/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-7b/2f5c3cd4eace6be6c0f12981f377fb35e5bf6ee5/modelling_RW.py", line 648, in forward
outputs = block(
File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/lfs/hyperturing1/0/brando9/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-7b/2f5c3cd4eace6be6c0f12981f377fb35e5bf6ee5/modelling_RW.py", line 385, in forward
attn_outputs = self.self_attention(
File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/lfs/hyperturing1/0/brando9/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-7b/2f5c3cd4eace6be6c0f12981f377fb35e5bf6ee5/modelling_RW.py", line 242, in forward
fused_qkv = self.query_key_value(hidden_states) # [batch_size, seq_length, 3 x hidden_size]
File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/peft/tuners/lora.py", line 565, in forward
result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2048x4544 and 1x10614784)
this error also happens in my a100 (of course)
official peft issue I opened for my issue: https://github.com/huggingface/peft/issues/685
My command is
python qlora.py --model_name_or_path huggyllama/llama-7b
and After downloading the weights, the following error occurred:File "/home/chenxinquan/miniconda3/envs/pytorch2.0/lib/python3.10/site-packages/peft/tuners/lora.py", line 565, in forward result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (334x4096 and 1x8388608)
Do anyone have such a problem and how to solve it?