"`.to` is not supported for `4-bit` or `8-bit` bitsandbytes models" when i use load_best_model_at_end=True in QLoRa

mkgs210 commented 6 months ago

I ran the qlora example by adding load_best_model_at_end=True and got ValueError: .to is not supported for 4-bit or 8-bit bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype.

Environment info

adapters version: 0.2.0
Platform: google colab / windows
Python version: 3.10.12 / 3.11
PyTorch version (GPU?): 2.2.1+cu121
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

Information

Model I am using (Bert, XLNet ...): llama 2

Language I am using the model on (English, Chinese ...): multi

Adapter setup I am using (if any): LORA

The problem arises when using:

[X] the official example scripts: (give details below) QLORA example
[ ] my own modified scripts: (give details below)

To reproduce

Steps to reproduce the behavior:

open QLoRA_Llama_Finetuning.ipynb
replace bf16 to fp16 in TrainingArguments because of google colab
add load_best_model_at_end=True, metric_for_best_model='eval_loss' in TrainingArguments
start notebook

Full error:

``` ValueError Traceback (most recent call last) [](https://localhost:8080/#) in () 11 ) 12 ---> 13 trainer.train() 4 frames [/usr/local/lib/python3.10/dist-packages/transformers/trainer.py](https://localhost:8080/#) in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs) 1778 hf_hub_utils.enable_progress_bars() 1779 else: -> 1780 return inner_training_loop( 1781 args=args, 1782 resume_from_checkpoint=resume_from_checkpoint, [/usr/local/lib/python3.10/dist-packages/transformers/trainer.py](https://localhost:8080/#) in _inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval) 2239 smp.barrier() 2240 -> 2241 self._load_best_model() 2242 2243 # add remaining tr_loss [/usr/local/lib/python3.10/dist-packages/adapters/trainer.py](https://localhost:8080/#) in _load_best_model(self) 223 if os.path.exists(fusion_dir): 224 model.load_adapter_fusion(fusion_dir) --> 225 model.to(self.args.device) 226 227 [/usr/local/lib/python3.10/dist-packages/accelerate/big_modeling.py](https://localhost:8080/#) in wrapper(*args, **kwargs) 454 if param.device == torch.device("meta"): 455 raise RuntimeError("You can't move a model that has some modules offloaded to cpu or disk.") --> 456 return fn(*args, **kwargs) 457 458 return wrapper [/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py](https://localhost:8080/#) in to(self, *args, **kwargs) 2552 # Checks if the model has been loaded in 8-bit 2553 if getattr(self, "quantization_method", None) == QuantizationMethod.BITS_AND_BYTES: -> 2554 raise ValueError( 2555 "`.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the" 2556 " model has already been set to the correct devices and casted to the correct `dtype`." ValueError: `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`. ```

Expected behavior

adapter trainer loads the best model at the end

calpt commented 6 months ago

Thanks for reporting this bug, should be fixed once #699 is merged.

mkgs210 commented 6 months ago

I had this Error: AttributeError: 'LlamaForCausalLM' object has no attribute 'adapter_to'

calpt commented 6 months ago

Are you using the latest library version from our main branch? These changes haven't been pushed to pypi yet.

adapter-hub / adapters