I am able to quantize llama 7b model to 4 bit. But how can I run this for my prediction. If I try transformer library i get error.
Python 3.10.12 (main, Jun 7 2023, 12:45:35) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("llama_7b_4bit_2.bin")
Traceback (most recent call last):
File "/home/intel-spc/Documents/tarun/t2/tar/lib/python3.10/site-packages/transformers/configuration_utils.py", line 659, in _get_config_dict
config_dict = cls._dict_from_json_file(resolved_config_file)
File "/home/intel-spc/Documents/tarun/t2/tar/lib/python3.10/site-packages/transformers/configuration_utils.py", line 750, in _dict_from_json_file
text = reader.read()
File "/usr/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "", line 1, in
File "/home/intel-spc/Documents/tarun/t2/tar/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 456, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "/home/intel-spc/Documents/tarun/t2/tar/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 944, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, kwargs)
File "/home/intel-spc/Documents/tarun/t2/tar/lib/python3.10/site-packages/transformers/configuration_utils.py", line 574, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, kwargs)
File "/home/intel-spc/Documents/tarun/t2/tar/lib/python3.10/site-packages/transformers/configuration_utils.py", line 662, in _get_config_dict
raise EnvironmentError(
OSError: It looks like the config file at 'llama_7b_4bit_2.bin' is not a valid JSON file.
I am able to quantize llama 7b model to 4 bit. But how can I run this for my prediction. If I try transformer library i get error.
Python 3.10.12 (main, Jun 7 2023, 12:45:35) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "", line 1, in
File "/home/intel-spc/Documents/tarun/t2/tar/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 456, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "/home/intel-spc/Documents/tarun/t2/tar/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 944, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, kwargs)
File "/home/intel-spc/Documents/tarun/t2/tar/lib/python3.10/site-packages/transformers/configuration_utils.py", line 574, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, kwargs)
File "/home/intel-spc/Documents/tarun/t2/tar/lib/python3.10/site-packages/transformers/configuration_utils.py", line 662, in _get_config_dict
raise EnvironmentError(
OSError: It looks like the config file at 'llama_7b_4bit_2.bin' is not a valid JSON file.