Shafi2016 / ChatOpenLLM

ChatOpenLLM is an open-source Python package that provides ChatOpenAI()-like functionality for various open-source models. Built upon the powerful Langchain library, OpenLLM makes it easy to implement chat models based on different transformer architectures
MIT License
2 stars 1 forks source link

Getting bitsandbytes and accelerate eror #1

Open mail2harishemail opened 10 months ago

mail2harishemail commented 10 months ago

please check this not book on kaggle getting below error

https://www.kaggle.com/code/gptforall/chatopenllm-test

from ChatOpenLLM import Chat_Llama, ChatGPTQ, Chat_AutoModels, create_tagging_chain2 llm = Chat_Llama("TheBloke/wizardLM-7B-HF", device_map='auto', llama_schema = None, max_new_tokens = 500, low_cpu_mem_usage=True, load_in_4bit=True, gen_kwargs=dict(temperature=0))

ImportError Traceback (most recent call last) Cell In[10], line 2 1 from ChatOpenLLM import Chat_Llama, ChatGPTQ, Chat_AutoModels, create_tagging_chain2 ----> 2 llm = Chat_Llama("TheBloke/wizardLM-7B-HF", 3 device_map='auto', llama_schema = None, max_new_tokens = 500, 4 low_cpu_mem_usage=True, 5 load_in_4bit=True, 6 gen_kwargs=dict(temperature=0))

File /opt/conda/lib/python3.10/site-packages/ChatOpenLLM/model/src/Llama.py:55, in Chat_Llama.init(self, model_path, device_map, low_cpu_mem_usage, gen_kwargs, max_new_tokens, llama_schema, load_in_4bit, load_in_8bit, torch_dtype) 53 super().init() 54 self.tokenizer = LlamaTokenizer.from_pretrained(model_path, use_fast=True) ---> 55 self.model = LlamaForCausalLM.from_pretrained( 56 model_path, 57 load_in_4bit=load_in_4bit, 58 load_in_8bit=load_in_8bit, 59 device_map=device_map, 60 torch_dtype=torch_dtype, 61 low_cpu_mem_usage=low_cpu_mem_usage, 62 ) 63 self.device = self.model.device 64 self.gen_kwargs = gen_kwargs

File /opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py:2482, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs) 2480 if load_in_8bit or load_in_4bit: 2481 if not (is_accelerate_available() and is_bitsandbytes_available()): -> 2482 raise ImportError( 2483 "Using load_in_8bit=True requires Accelerate: pip install accelerate and the latest version of" 2484 " bitsandbytes pip install -i https://test.pypi.org/simple/ bitsandbytes or" 2485 " pip install bitsandbytes" 2486 ) 2488 if torch_dtype is None: 2489 # We force thedtypeto be float16, this is a requirement frombitsandbytes 2490 logger.info( 2491 f"Overriding torch_dtype={torch_dtype} withtorch_dtype=torch.float16due to " 2492 "requirements ofbitsandbytes` to enable model loading in 8-bit or 4-bit. " 2493 "Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass" 2494 " torch_dtype=torch.float16 to remove this warning." 2495 )

ImportError: Using load_in_8bit=True requires Accelerate: pip install accelerate and the latest version of bitsandbytes pip install -i https://test.pypi.org/simple/ bitsandbytes or pip install bitsandbytes`

Shafi2016 commented 10 months ago

You need to turn on GPU, it should work, here is Colab notebook