Open ahe168 opened 3 months ago
The solution can be easily found on Google.
My proposed solution to this issue is as follows:
I hope my answer will work for you.
The error message you're encountering indicates that the bitsandbytes
8-bit quantization requires the Accelerate library and the latest version of bitsandbytes. Additionally, the arguments load_in_4bit and load_in_8bit are deprecated, and you should use a BitsAndBytesConfig object with the quantization_config argument instead.
Try out these steps and let me know, if it works:-
BitsAndBytesConfig
:
sample snippet:-
from transformers import LlamaForCausalLM, LlamaTokenizerFast
from transformers import BitsAndBytesConfig
base_model = 'your_model_path_or_name' peft_model = 'your_peft_model_path_or_name'
tokenizer = LlamaTokenizerFast.from_pretrained(base_model, trust_remote_code=True) tokenizer.pad_token = tokenizer.eos_token
quant_config = BitsAndBytesConfig.load_in_8bit()
model = LlamaForCausalLM.from_pretrained( base_model, trust_remote_code=True, device_map="cuda:0", quantization_config=quant_config ) model = PeftModel.from_pretrained(model, peft_model) model = model.eval()
and lastly check whether you are using CUDA environment
import torch print(torch.cuda.is_available())
Hope, this helps
Thanks
The argument
trust_remote_code
is to be used with Auto classes. It has no effect here and is ignored. Theload_in_4bit
andload_in_8bit
arguments are deprecated and will be removed in the future versions. Please, pass aBitsAndBytesConfig
object inquantization_config
argument instead.ImportError Traceback (most recent call last) Cell In[21], line 6 4 tokenizer = LlamaTokenizerFast.from_pretrained(base_model, trust_remote_code=True) 5 tokenizer.pad_token = tokenizer.eos_token ----> 6 model = LlamaForCausalLM.from_pretrained(base_model, trust_remote_code=True, device_map = "cuda:0", load_in_8bit = True,) 7 model = PeftModel.from_pretrained(model, peft_model) 8 model = model.eval()
File /opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py:3049, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs) 3046 hf_quantizer = None 3048 if hf_quantizer is not None: -> 3049 hf_quantizer.validate_environment( 3050 torch_dtype=torch_dtype, from_tf=from_tf, from_flax=from_flax, device_map=device_map 3051 ) 3052 torch_dtype = hf_quantizer.update_torch_dtype(torch_dtype) 3053 device_map = hf_quantizer.update_device_map(device_map)
File /opt/conda/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_8bit.py:62, in Bnb8BitHfQuantizer.validate_environment(self, *args, *kwargs) 60 def validate_environment(self, args, **kwargs): 61 if not (is_accelerate_available() and is_bitsandbytes_available()): ---> 62 raise ImportError( 63 "Using
bitsandbytes
8-bit quantization requires Accelerate:pip install accelerate
" 64 "and the latest version of bitsandbytes:pip install -i https://pypi.org/simple/ bitsandbytes
" 65 ) 67 if kwargs.get("from_tf", False) or kwargs.get("from_flax", False): 68 raise ValueError( 69 "Converting into 4-bit or 8-bit weights from tf/flax weights is currently not supported, please make" 70 " sure the weights are in PyTorch format." 71 )ImportError: Using
bitsandbytes
8-bit quantization requires Accelerate:pip install accelerate
and the latest version of bitsandbytes:pip install -i https://pypi.org/simple/ bitsandbytes