InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
3.13k stars 280 forks source link

[Bug] No way to you specify a model revision? #1804

Open fake-name opened 1 week ago

fake-name commented 1 week ago

Checklist

Describe the bug

From what I can tell, this uses the various Transformers *.from_pretrained() calls to download models from hugging face.

I'm using some models where the the actual model files are only present in non-main branches of the repository. For an example, see https://huggingface.co/bartowski/Yi-34B-200K-RPMerge-exl2

This fails if you try to simply refer to the model.

I cannot see any way to specify a specific branch.

Reproduction

(base) durr@learner:/media/Scripts/book_cat$ docker run --runtime nvidia --gpus all \
>     -v ~/.cache/huggingface:/root/.cache/huggingface \
>     -p 23333:23333 \
>     --ipc=host \
>     openmmlab/lmdeploy:latest \
>     lmdeploy serve api_server bartowski/Yi-34B-200K-RPMerge-exl2

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 22.12 (build 50109463)
Triton Server Version 2.29.0

Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

Fetching 3 files: 100%|██████████| 3/3 [00:00<00:00,  3.74it/s]
Traceback (most recent call last):
  File "/opt/lmdeploy/lmdeploy/archs.py", line 172, in get_model_arch
    cfg = AutoConfig.from_pretrained(model_path,
  File "/opt/py38/lib/python3.8/site-packages/transformers/models/auto/configuration_auto.py", line 934, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/opt/py38/lib/python3.8/site-packages/transformers/configuration_utils.py", line 632, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/opt/py38/lib/python3.8/site-packages/transformers/configuration_utils.py", line 689, in _get_config_dict
    resolved_config_file = cached_file(
  File "/opt/py38/lib/python3.8/site-packages/transformers/utils/hub.py", line 370, in cached_file
    raise EnvironmentError(
OSError: /root/.cache/huggingface/hub/models--bartowski--Yi-34B-200K-RPMerge-exl2/snapshots/6a044cb3ec9b116e41d049817f1c38e8e74a09f1 does not appear to have a file named config.json. Checkout 'https://huggingface.co//root/.cache/huggingface/hub/models--bartowski--Yi-34B-200K-RPMerge-exl2/snapshots/6a044cb3ec9b116e41d049817f1c38e8e74a09f1/tree/None' for available files.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/py38/bin/lmdeploy", line 33, in <module>
    sys.exit(load_entry_point('lmdeploy', 'console_scripts', 'lmdeploy')())
  File "/opt/lmdeploy/lmdeploy/cli/entrypoint.py", line 37, in run
    args.run(args)
  File "/opt/lmdeploy/lmdeploy/cli/serve.py", line 268, in api_server
    backend = autoget_backend(args.model_path)
  File "/opt/lmdeploy/lmdeploy/archs.py", line 44, in autoget_backend
    turbomind_has = is_supported_turbomind(model_path)
  File "/opt/lmdeploy/lmdeploy/turbomind/supported_models.py", line 70, in is_supported
    arch, cfg = get_model_arch(model_path)
  File "/opt/lmdeploy/lmdeploy/archs.py", line 176, in get_model_arch
    cfg = PretrainedConfig.from_pretrained(model_path)
  File "/opt/py38/lib/python3.8/site-packages/transformers/configuration_utils.py", line 603, in from_pretrained
    config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/opt/py38/lib/python3.8/site-packages/transformers/configuration_utils.py", line 632, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/opt/py38/lib/python3.8/site-packages/transformers/configuration_utils.py", line 689, in _get_config_dict
    resolved_config_file = cached_file(
  File "/opt/py38/lib/python3.8/site-packages/transformers/utils/hub.py", line 370, in cached_file
    raise EnvironmentError(
OSError: /root/.cache/huggingface/hub/models--bartowski--Yi-34B-200K-RPMerge-exl2/snapshots/6a044cb3ec9b116e41d049817f1c38e8e74a09f1 does not appear to have a file named config.json. Checkout 'https://huggingface.co//root/.cache/huggingface/hub/models--bartowski--Yi-34B-200K-RPMerge-exl2/snapshots/6a044cb3ec9b116e41d049817f1c38e8e74a09f1/tree/main' for available files.

Environment

Not relevant.
zhyncs commented 1 week ago

@fake-name Is this similar? https://github.com/huggingface/transformers/blob/1c1aec2ef1d6822fae3ffbb973b4c941f65f4ddf/docs/source/en/model_sharing.md?plain=1#L42-L48

It's not complicated to implement, just pass in a parameter, and by default it will be the default branch and the latest commit.

irexyc commented 1 week ago

You may want this https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/messages.py#L151, but it can't be set by cli currently. The cli should be updated.

irexyc commented 1 week ago

With https://github.com/InternLM/lmdeploy/pull/1814 you are specify a model revision. But we don't support bartowski/Yi-34B-200K-RPMerge-exl2. It use ExLlamaV2 to quant the model, we only support awq quantization method.

For awq quantization, you can refer to this