Closed BEpresent closed 1 month ago
I'm trying to run "TheBloke/Llama-2-13B-chat-GPTQ" using version 0.3.6 and I get the same error:
2023-10-13T09:36:44+0300 [ERROR] [runner:llm-llama-runner:1] Traceback (most recent call last): File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/starlette/routing.py", line 705, in lifespan async with self.lifespan_context(app) as maybe_state: File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/contextlib.py", line 181, in aenter return await self.gen.anext() File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/bentoml/_internal/server/base_app.py", line 75, in lifespan on_startup() File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/bentoml/_internal/runner/runner.py", line 317, in init_local raise e File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/bentoml/_internal/runner/runner.py", line 307, in init_local self._set_handle(LocalRunnerRef) File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/bentoml/_internal/runner/runner.py", line 150, in _set_handle runner_handle = handle_class(self, args, kwargs) File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/bentoml/_internal/runner/runner_handle/local.py", line 27, in init self._runnable = runner.runnable_class(runner.runnable_init_params) # type: ignore File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/openllm/_llm.py", line 1166, in init if not self.model: raise RuntimeError('Failed to load the model correctly (See traceback above)') File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/openllm/_llm.py", line 748, in model model = self.load_model(self._model_decls, self._model_attrs) File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/openllm/_assign.py", line 71, in inner return fn(self, *decls, *attrs) File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/openllm/serialisation/init.py", line 75, in caller return getattr(importlib.import_module(f'.{serde}', name), fn)(llm, args, kwargs) File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/openllm/serialisation/transformers/init.py", line 182, in load_model model = auto_class.from_pretrained(llm._bentomodel.path, *decls, config=config, trust_remote_code=llm.trust_remote_code, device_map=device_map, hub_attrs, attrs).eval() File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 565, in from_pretrained return model_class.from_pretrained( File "/opt/miniforge/miniforge3/envs/openllm/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2683, in from_pretrained quantization_method_from_config = config.quantization_config.get( AttributeError: 'GPTQConfig' object has no attribute 'get'
I wonder whether this is related to the models being not quite recent ones? (In light of the previous comment)
I wonder whether this is related to the models being not quite recent ones? (In light of the previous comment)
This could be - on the TGI repo they mention it could have to do with some old quantization script from TheBloke (different error in TGI, but my guess is it might be similar).
Describe the bug
I try to run one of TheBloke's quantized models on an A100 40GB. It is not one of the most recent models
To reproduce
However I get the following error:
Environment
System information
bentoml
: 1.1.6python
: 3.9.2platform
: Linux-5.10.0-23-cloud-amd64-x86_64-with-glibc2.31uid_gid
: 1000:1001conda
: 23.5.0in_conda_env
: Truename: base channels: