bentoml / OpenLLM

Run any open-source LLMs, such as Llama, Gemma, as OpenAI compatible API endpoint in the cloud.
https://bentoml.com
Apache License 2.0
10.05k stars 636 forks source link

bug: load llama model from local path but got error 'Cannot find commit hash in LlamaConfig' #598

Closed qaz-t closed 1 year ago

qaz-t commented 1 year ago

Describe the bug

I'm using conda to create env with python 3.10.12, and install related package using

pip install "openllm[llama, vllm]"

when i start a llama service using

openllm start llama --model-id /home/user/models/Llama-2-70B-chat-GPTQ --quantize gptq --workers-per-resource 0.125

it works fine in openllm==0.3.9 However, an error occurred in version 0.4.0 and 0.4.1, i've tried -hf -gptq -awq model from TheBloke in huggingface, and got the same.

To reproduce

install requirements

pip install "openllm[llama, vllm]"

start server

openllm start llama --model-id /home/user/models/Llama-2-70B-Chat-AWQ --backend vllm --workers-per-resource 0.125 --quantise awq

Logs

Traceback (most recent call last):
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/openllm/serialisation/transformers/__init__.py", line 129, in get
    model = bentoml.models.get(llm.tag)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/simple_di/__init__.py", line 139, in _
    return func(*_inject_args(bind.args), **_inject_kwargs(bind.kwargs))
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/bentoml/models.py", line 45, in get
    return _model_store.get(tag)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/bentoml/_internal/store.py", line 158, in get
    raise NotFound(
bentoml.exceptions.NotFound: Model 'vllm-llama-2-70b-chat-awq:14f4806647c05a0905cd70c55651c9ba7bde8a56' is not found in BentoML store <osfs '/home/user/bentoml/models'>, you may need to run `bentoml models pull` first

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/openllm/cli/entrypoint.py", line 381, in import_command
    _ref = openllm.serialisation.get(llm)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/openllm/serialisation/__init__.py", line 78, in caller
    return getattr(importlib.import_module(f'.{serde}', __name__), fn)(llm, *args, **kwargs)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/openllm/serialisation/transformers/__init__.py", line 136, in get
    raise openllm.exceptions.OpenLLMException(f'Failed while getting stored artefact (lookup for traceback):\n{err}') from err
openllm_core.exceptions.OpenLLMException: Failed while getting stored artefact (lookup for traceback):
Model 'vllm-llama-2-70b-chat-awq:14f4806647c05a0905cd70c55651c9ba7bde8a56' is not found in BentoML store <osfs '/home/user/bentoml/models'>, you may need to run `bentoml models pull` first

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/openllm/serialisation/transformers/__init__.py", line 129, in get
    model = bentoml.models.get(llm.tag)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/simple_di/__init__.py", line 139, in _
    return func(*_inject_args(bind.args), **_inject_kwargs(bind.kwargs))
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/bentoml/models.py", line 45, in get
    return _model_store.get(tag)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/bentoml/_internal/store.py", line 158, in get
    raise NotFound(
bentoml.exceptions.NotFound: Model 'vllm-llama-2-70b-chat-awq:14f4806647c05a0905cd70c55651c9ba7bde8a56' is not found in BentoML store <osfs '/home/user/bentoml/models'>, you may need to run `bentoml models pull` first

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/miniconda3/envs/openllm/bin/openllm", line 8, in <module>
    sys.exit(cli())
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/openllm/cli/entrypoint.py", line 186, in wrapper
    return_value = func(*args, **attrs)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/openllm/cli/entrypoint.py", line 168, in wrapper
    return f(*args, **attrs)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/openllm/cli/_factory.py", line 194, in start_cmd
    llm.save_pretrained()  # ensure_available = True
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/openllm/_llm.py", line 233, in save_pretrained
    def save_pretrained(self)->bentoml.Model:return openllm.import_model(self.config['start_name'], model_id=self.model_id, model_version=self._revision, backend=self.__llm_backend__, quantize=self._quantise)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/openllm/cli/_sdk.py", line 262, in _import_model
    return import_command.main(args=args, standalone_mode=False)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/openllm/cli/entrypoint.py", line 186, in wrapper
    return_value = func(*args, **attrs)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/openllm/cli/entrypoint.py", line 168, in wrapper
    return f(*args, **attrs)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/openllm/cli/entrypoint.py", line 387, in import_command
    _ref = openllm.serialisation.get(llm, auto_import=True)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/openllm/serialisation/__init__.py", line 78, in caller
    return getattr(importlib.import_module(f'.{serde}', __name__), fn)(llm, *args, **kwargs)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/openllm/serialisation/transformers/__init__.py", line 135, in get
    if auto_import: return import_model(llm, trust_remote_code=llm.trust_remote_code)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/simple_di/__init__.py", line 139, in _
    return func(*_inject_args(bind.args), **_inject_kwargs(bind.kwargs))
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/openllm/serialisation/transformers/__init__.py", line 64, in import_model
    metadata['_revision'] = get_hash(config)
  File "/home/user/miniconda3/envs/openllm/lib/python3.10/site-packages/openllm/serialisation/transformers/_helpers.py", line 22, in get_hash
    if _commit_hash is None: raise ValueError(f'Cannot find commit hash in {config}')
ValueError: Cannot find commit hash in LlamaConfig {
  "_name_or_path": "/home/user/models/Llama-2-70B-Chat-AWQ",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 8192,
  "initializer_range": 0.02,
  "intermediate_size": 28672,
  "max_position_embeddings": 4096,
  "model_type": "llama",
  "num_attention_heads": 64,
  "num_hidden_layers": 80,
  "num_key_value_heads": 8,
  "pad_token_id": 0,
  "pretraining_tp": 1,
  "quantization_config": {
    "bits": 4,
    "group_size": 128,
    "quant_method": "awq",
    "version": "gemm",
    "zero_point": true
  },
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 10000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "float16",
  "transformers_version": "4.35.0",
  "use_cache": true,
  "vocab_size": 32000
}

Environment

Environment variable

BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''

System information

bentoml: 1.1.9 python: 3.10.12 platform: Linux-5.4.0-166-generic-x86_64-with-glibc2.31 uid_gid: 53201113:53200513 conda: 23.9.0 in_conda_env: True

conda_packages
```yaml name: openllm channels: - defaults dependencies: - _libgcc_mutex=0.1=main - _openmp_mutex=5.1=1_gnu - bzip2=1.0.8=h7b6447c_0 - ca-certificates=2023.08.22=h06a4308_0 - ld_impl_linux-64=2.38=h1181459_1 - libffi=3.4.4=h6a678d5_0 - libgcc-ng=11.2.0=h1234567_1 - libgomp=11.2.0=h1234567_1 - libstdcxx-ng=11.2.0=h1234567_1 - libuuid=1.41.5=h5eee18b_0 - ncurses=6.4=h6a678d5_0 - openssl=3.0.12=h7f8727e_0 - pip=23.3=py310h06a4308_0 - python=3.10.12=h955ad1f_0 - readline=8.2=h5eee18b_0 - setuptools=68.0.0=py310h06a4308_0 - sqlite=3.41.2=h5eee18b_0 - tk=8.6.12=h1ccaba5_0 - wheel=0.41.2=py310h06a4308_0 - xz=5.4.2=h5eee18b_0 - zlib=1.2.13=h5eee18b_0 - pip: - accelerate==0.24.1 - aiohttp==3.8.6 - aiosignal==1.3.1 - anyio==3.7.1 - appdirs==1.4.4 - asgiref==3.7.2 - async-timeout==4.0.3 - attrs==23.1.0 - bentoml==1.1.9 - bitsandbytes==0.41.2.post1 - build==1.0.3 - cattrs==23.1.2 - certifi==2023.7.22 - charset-normalizer==3.3.2 - circus==0.18.0 - click==8.1.7 - click-option-group==0.5.6 - cloudpickle==3.0.0 - cmake==3.27.7 - coloredlogs==15.0.1 - contextlib2==21.6.0 - cuda-python==12.3.0 - datasets==2.14.6 - deepmerge==1.1.0 - deprecated==1.2.14 - dill==0.3.7 - exceptiongroup==1.1.3 - fairscale==0.4.13 - fastapi==0.104.1 - fastcore==1.5.29 - filelock==3.13.1 - filetype==1.2.0 - frozenlist==1.4.0 - fs==2.4.16 - fsspec==2023.10.0 - ghapi==1.0.4 - h11==0.14.0 - httpcore==1.0.1 - httptools==0.6.1 - httpx==0.25.1 - huggingface-hub==0.17.3 - humanfriendly==10.0 - idna==3.4 - importlib-metadata==6.8.0 - inflection==0.5.1 - jinja2==3.1.2 - jsonschema==4.19.2 - jsonschema-specifications==2023.7.1 - lit==17.0.4 - markdown-it-py==3.0.0 - markupsafe==2.1.3 - mdurl==0.1.2 - mpmath==1.3.0 - msgpack==1.0.7 - multidict==6.0.4 - multiprocess==0.70.15 - mypy-extensions==1.0.0 - networkx==3.2.1 - ninja==1.11.1.1 - numpy==1.26.1 - nvidia-cublas-cu11==11.10.3.66 - nvidia-cuda-cupti-cu11==11.7.101 - nvidia-cuda-nvrtc-cu11==11.7.99 - nvidia-cuda-runtime-cu11==11.7.99 - nvidia-cudnn-cu11==8.5.0.96 - nvidia-cufft-cu11==10.9.0.58 - nvidia-curand-cu11==10.2.10.91 - nvidia-cusolver-cu11==11.4.0.1 - nvidia-cusparse-cu11==11.7.4.91 - nvidia-ml-py==11.525.150 - nvidia-nccl-cu11==2.14.3 - nvidia-nvtx-cu11==11.7.91 - openllm==0.4.1 - openllm-client==0.4.1 - openllm-core==0.4.1 - opentelemetry-api==1.20.0 - opentelemetry-instrumentation==0.41b0 - opentelemetry-instrumentation-aiohttp-client==0.41b0 - opentelemetry-instrumentation-asgi==0.41b0 - opentelemetry-sdk==1.20.0 - opentelemetry-semantic-conventions==0.41b0 - opentelemetry-util-http==0.41b0 - optimum==1.14.0 - orjson==3.9.10 - packaging==23.2 - pandas==2.1.2 - pathspec==0.11.2 - pillow==10.1.0 - pip-requirements-parser==32.0.1 - pip-tools==7.3.0 - prometheus-client==0.18.0 - protobuf==4.25.0 - psutil==5.9.6 - pyarrow==14.0.1 - pydantic==1.10.13 - pygments==2.16.1 - pyparsing==3.1.1 - pyproject-hooks==1.0.0 - python-dateutil==2.8.2 - python-dotenv==1.0.0 - python-json-logger==2.0.7 - python-multipart==0.0.6 - pytz==2023.3.post1 - pyyaml==6.0.1 - pyzmq==25.1.1 - ray==2.8.0 - referencing==0.30.2 - regex==2023.10.3 - requests==2.31.0 - rich==13.6.0 - rpds-py==0.12.0 - safetensors==0.4.0 - schema==0.7.5 - scipy==1.11.3 - sentencepiece==0.1.99 - simple-di==0.1.5 - six==1.16.0 - sniffio==1.3.0 - starlette==0.27.0 - sympy==1.12 - tabulate==0.9.0 - tokenizers==0.14.1 - tomli==2.0.1 - torch==2.0.1 - tornado==6.3.3 - tqdm==4.66.1 - transformers==4.35.0 - triton==2.0.0 - typing-extensions==4.8.0 - tzdata==2023.3 - urllib3==2.0.7 - uvicorn==0.24.0.post1 - uvloop==0.19.0 - vllm==0.2.1.post1 - watchfiles==0.21.0 - wcwidth==0.2.9 - websockets==12.0 - wrapt==1.15.0 - xformers==0.0.22 - xxhash==3.4.1 - yarl==1.9.2 - zipp==3.17.0 prefix: /home/user/miniconda3/envs/openllm ```
pip_packages
``` accelerate==0.24.1 aiohttp==3.8.6 aiosignal==1.3.1 anyio==3.7.1 appdirs==1.4.4 asgiref==3.7.2 async-timeout==4.0.3 attrs==23.1.0 bentoml==1.1.9 bitsandbytes==0.41.2.post1 build==1.0.3 cattrs==23.1.2 certifi==2023.7.22 charset-normalizer==3.3.2 circus==0.18.0 click==8.1.7 click-option-group==0.5.6 cloudpickle==3.0.0 cmake==3.27.7 coloredlogs==15.0.1 contextlib2==21.6.0 cuda-python==12.3.0 datasets==2.14.6 deepmerge==1.1.0 Deprecated==1.2.14 dill==0.3.7 exceptiongroup==1.1.3 fairscale==0.4.13 fastapi==0.104.1 fastcore==1.5.29 filelock==3.13.1 filetype==1.2.0 frozenlist==1.4.0 fs==2.4.16 fsspec==2023.10.0 ghapi==1.0.4 h11==0.14.0 httpcore==1.0.1 httptools==0.6.1 httpx==0.25.1 huggingface-hub==0.17.3 humanfriendly==10.0 idna==3.4 importlib-metadata==6.8.0 inflection==0.5.1 Jinja2==3.1.2 jsonschema==4.19.2 jsonschema-specifications==2023.7.1 lit==17.0.4 markdown-it-py==3.0.0 MarkupSafe==2.1.3 mdurl==0.1.2 mpmath==1.3.0 msgpack==1.0.7 multidict==6.0.4 multiprocess==0.70.15 mypy-extensions==1.0.0 networkx==3.2.1 ninja==1.11.1.1 numpy==1.26.1 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.2.10.91 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusparse-cu11==11.7.4.91 nvidia-ml-py==11.525.150 nvidia-nccl-cu11==2.14.3 nvidia-nvtx-cu11==11.7.91 openllm==0.4.1 openllm-client==0.4.1 openllm-core==0.4.1 opentelemetry-api==1.20.0 opentelemetry-instrumentation==0.41b0 opentelemetry-instrumentation-aiohttp-client==0.41b0 opentelemetry-instrumentation-asgi==0.41b0 opentelemetry-sdk==1.20.0 opentelemetry-semantic-conventions==0.41b0 opentelemetry-util-http==0.41b0 optimum==1.14.0 orjson==3.9.10 packaging==23.2 pandas==2.1.2 pathspec==0.11.2 Pillow==10.1.0 pip-requirements-parser==32.0.1 pip-tools==7.3.0 prometheus-client==0.18.0 protobuf==4.25.0 psutil==5.9.6 pyarrow==14.0.1 pydantic==1.10.13 Pygments==2.16.1 pyparsing==3.1.1 pyproject_hooks==1.0.0 python-dateutil==2.8.2 python-dotenv==1.0.0 python-json-logger==2.0.7 python-multipart==0.0.6 pytz==2023.3.post1 PyYAML==6.0.1 pyzmq==25.1.1 ray==2.8.0 referencing==0.30.2 regex==2023.10.3 requests==2.31.0 rich==13.6.0 rpds-py==0.12.0 safetensors==0.4.0 schema==0.7.5 scipy==1.11.3 sentencepiece==0.1.99 simple-di==0.1.5 six==1.16.0 sniffio==1.3.0 starlette==0.27.0 sympy==1.12 tabulate==0.9.0 tokenizers==0.14.1 tomli==2.0.1 torch==2.0.1 tornado==6.3.3 tqdm==4.66.1 transformers==4.35.0 triton==2.0.0 typing_extensions==4.8.0 tzdata==2023.3 urllib3==2.0.7 uvicorn==0.24.0.post1 uvloop==0.19.0 vllm==0.2.1.post1 watchfiles==0.21.0 wcwidth==0.2.9 websockets==12.0 wrapt==1.15.0 xformers==0.0.22 xxhash==3.4.1 yarl==1.9.2 zipp==3.17.0 ```

System information (Optional)

No response

aarnphm commented 1 year ago

You will need to provide model_version from local for now.

I will take a look into support local model more concretely. We have a different issue that is tracking local path

aarnphm commented 1 year ago

~oh actually this seems to be like a bug. Will fix it~

~Edit: this is not a bug, sorry~

Edit 2: This is a bug, I will release a fixes shortly.