artidoro / qlora

QLoRA: Efficient Finetuning of Quantized LLMs
https://arxiv.org/abs/2305.14314
MIT License
9.74k stars 800 forks source link

Merge issue #277

Open qburst-fidha opened 8 months ago

qburst-fidha commented 8 months ago

I am trying to merge my adaptor with base model after finetuning using qlora.

Error

==================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /home/ubuntu/miniconda3/envs/training_llama/lib/python3.11/site-packages/bitsandbytes-0.39.0-py3.11.egg/bitsandbytes/libbitsandbytes_cuda117.so /home/ubuntu/miniconda3/envs/training_llama/lib/python3.11/site-packages/bitsandbytes-0.39.0-py3.11.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/ubuntu/miniconda3/envs/training_llama did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.7/lib64/libcudart.so.11.0 CUDA SETUP: Highest compute capability among GPUs detected: 8.6 CUDA SETUP: Detected CUDA version 117 CUDA SETUP: Loading binary /home/ubuntu/miniconda3/envs/training_llama/lib/python3.11/site-packages/bitsandbytes-0.39.0-py3.11.egg/bitsandbytes/libbitsandbytes_cuda117.so... Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [18:13<00:00, 37.72s/it] /home/ubuntu/miniconda3/envs/training_llama/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:381: UserWarning: do_sample is set to False. However, temperature is set to 0.9 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset temperature. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. warnings.warn( /home/ubuntu/miniconda3/envs/training_llama/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:386: UserWarning: do_sample is set to False. However, top_p is set to 0.6 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_p. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. warnings.warn( Traceback (most recent call last): File "/home/ubuntu/llma2/training/qlora/merge_v1.py", line 18, in model = model.merge_and_unload() ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/miniconda3/envs/training_llama/lib/python3.11/site-packages/peft/tuners/lora/model.py", line 658, in merge_and_unload return self._unload_and_optionally_merge( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/miniconda3/envs/training_llama/lib/python3.11/site-packages/peft/tuners/lora/model.py", line 390, in _unload_and_optionally_merge target.merge(safe_merge=safe_merge, adapter_names=adapter_names) TypeError: Linear4bit.merge() got an unexpected keyword argument 'adapter_names'

This is my code:

model_id="./models/WizardLM_WizardLM-70B-V1.0" adapter_id="./models/checkpoint-300/adapter_model/"

tokenizer = LlamaTokenizer.from_pretrained(model_id) model = LlamaForCausalLM.from_pretrained(model_id, load_in_4bit=True, device_map='auto', torch_dtype=torch.float16)

model = PeftModel.from_pretrained(model, adapter_id) model = model.merge_and_unload() torch.save(model.state_dict(), "./final_model/model.bin")