artidoro / qlora

QLoRA: Efficient Finetuning of Quantized LLMs
https://arxiv.org/abs/2305.14314
MIT License
9.9k stars 816 forks source link

Multiplication error #100

Open two-tiger opened 1 year ago

two-tiger commented 1 year ago

My command is python qlora.py --model_name_or_path huggyllama/llama-7b and After downloading the weights, the following error occurred: File "/home/chenxinquan/miniconda3/envs/pytorch2.0/lib/python3.10/site-packages/peft/tuners/lora.py", line 565, in forward result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (334x4096 and 1x8388608) Do anyone have such a problem and how to solve it?

he20010515 commented 1 year ago

Hello, I had the same problem.

andersonbcdefg commented 1 year ago

I encountered this too, has anyone identified a solution?

Y-Franck commented 1 year ago

For me, the issue is that I use another model.cuda() after load the model (which is already on cuda). Simply comment out model.cuda() solves the problem.

Here are a list of packages version in my environment: pytorch: 1.14.0a0+410ce96 transformers: 4.30.1 accelerate: 0.20.3 peft: 0.4.0.dev0

Hope this might be useful :)

brando90 commented 1 year ago

Same issue for me:

If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 113
CUDA SETUP: Loading binary /lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
mode='disabled'
run=
report_to='none'
{'report_to': 'none', 'path2config': '/lfs/hyperturing1/0/brando9/ultimate-utils/ultimate-utils-proj-src/uutils/wandb_uu/sweep_configs/debug_config.yaml', 'program': '~/ultimate-utils/ultimate-utils-proj-src/uutils/wandb_uu/sweeps_common.py', 'project': 'playground', 'entity': 'brando', 'name': 'debug-logging-to-wandb-plataform-test', 'description': 'debug-not-logging-to-wandb-plataform-test', 'metric': {'name': 'train_loss', 'goal': 'minimize'}, 'method': 'random', 'optimizer': 'nadam', 'scheduler': 'cosine', 'lr': 0.0001, 'batch_size': 32, 'num_its': 2, 'run_cap': 1}
Found cached dataset json (/lfs/hyperturing1/0/brando9/.cache/huggingface/datasets/timdettmers___json/timdettmers--openassistant-guanaco-6126c710748182cf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
Found cached dataset json (/lfs/hyperturing1/0/brando9/.cache/huggingface/datasets/timdettmers___json/timdettmers--openassistant-guanaco-6126c710748182cf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:12<00:00,  1.61s/it]
Loading cached processed dataset at /lfs/hyperturing1/0/brando9/.cache/huggingface/datasets/timdettmers___json/timdettmers--openassistant-guanaco-6126c710748182cf/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-889fee109929377a.arrow
  0%|                                                                                                                           | 0/500 [00:00<?, ?it/s]You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Traceback (most recent call last):
  File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/pdb.py", line 1723, in main
    pdb._runscript(mainpyfile)
  File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/pdb.py", line 1583, in _runscript
    self.run(statement)
  File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/bdb.py", line 598, in run
    exec(cmd, globals, locals)
  File "<string>", line 1, in <module>
  File "/afs/cs.stanford.edu/u/brando9/ultimate-utils/ultimate-utils-proj-src/uutils/hf_uu/mains_hf/falcon_uu/main_falcon_uu.py", line 34, in <module>
    main_falcon()
  File "/afs/cs.stanford.edu/u/brando9/ultimate-utils/ultimate-utils-proj-src/uutils/hf_uu/mains_hf/falcon_uu/main_falcon_uu.py", line 21, in main_falcon
    train(args)
  File "/afs/cs.stanford.edu/u/brando9/ultimate-utils/ultimate-utils-proj-src/uutils/hf_uu/train/sft/qlora_ft.py", line 58, in train_falcon
    trainer.train()
  File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/transformers/trainer.py", line 1645, in train
    return inner_training_loop(
  File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/transformers/trainer.py", line 1938, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/transformers/trainer.py", line 2759, in training_step
    loss = self.compute_loss(model, inputs)
  File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/transformers/trainer.py", line 2784, in compute_loss
    outputs = model(**inputs)
  File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/accelerate/utils/operations.py", line 553, in forward
    return model_forward(*args, **kwargs)
  File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/accelerate/utils/operations.py", line 541, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
    return func(*args, **kwargs)
  File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/peft/peft_model.py", line 678, in forward
    return self.base_model(
  File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/lfs/hyperturing1/0/brando9/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-7b/2f5c3cd4eace6be6c0f12981f377fb35e5bf6ee5/modelling_RW.py", line 753, in forward
    transformer_outputs = self.transformer(
  File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/lfs/hyperturing1/0/brando9/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-7b/2f5c3cd4eace6be6c0f12981f377fb35e5bf6ee5/modelling_RW.py", line 648, in forward
    outputs = block(
  File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/lfs/hyperturing1/0/brando9/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-7b/2f5c3cd4eace6be6c0f12981f377fb35e5bf6ee5/modelling_RW.py", line 385, in forward
    attn_outputs = self.self_attention(
  File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/lfs/hyperturing1/0/brando9/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-7b/2f5c3cd4eace6be6c0f12981f377fb35e5bf6ee5/modelling_RW.py", line 242, in forward
    fused_qkv = self.query_key_value(hidden_states)  # [batch_size, seq_length, 3 x hidden_size]
  File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/lfs/hyperturing1/0/brando9/miniconda/envs/data_quality/lib/python3.10/site-packages/peft/tuners/lora.py", line 565, in forward
    result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2048x4544 and 1x10614784)
brando90 commented 1 year ago

this error also happens in my a100 (of course)

brando90 commented 1 year ago

official peft issue I opened for my issue: https://github.com/huggingface/peft/issues/685