[Bug] 使用docker部署internlm/internlm-xcomposer-vl-7b和internlm/internlm-xcomposer2-vl-7b均报错

Checklist

[X] 1. I have searched related issues but cannot get the expected help.
[X] 2. The bug has not been fixed in the latest version.

Describe the bug

0.4.1版本下，使用sudo docker run --runtime nvidia --gpus all \ -v ~/.cache/huggingface:/root/.cache/huggingface \ -v /home/tskj/MOD:/home/MOD/ \ -p 23333:23333 \ --ipc=host \ openmmlab/lmdeploy:v0.4.1 \ lmdeploy serve api_server /home/MOD/internlm-xcomposer-vl-7b internlm/internlm-xcomposer-vl-7b报以下错误： Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Traceback (most recent call last): File "/opt/py38/bin/lmdeploy", line 11, in load_entry_point('lmdeploy', 'console_scripts', 'lmdeploy')() File "/opt/lmdeploy/lmdeploy/cli/entrypoint.py", line 37, in run args.run(args) File "/opt/lmdeploy/lmdeploy/cli/serve.py", line 283, in api_server run_api_server(args.model_path, File "/opt/lmdeploy/lmdeploy/serve/openai/api_server.py", line 1191, in serve VariableInterface.async_engine = pipeline_class( File "/opt/lmdeploy/lmdeploy/serve/async_engine.py", line 206, in init self._build_turbomind(model_path=model_path, File "/opt/lmdeploy/lmdeploy/serve/async_engine.py", line 254, in _build_turbomind self.engine = tm.TurboMind.from_pretrained( File "/opt/lmdeploy/lmdeploy/turbomind/turbomind.py", line 396, in from_pretrained return cls(model_path=pretrained_model_name_or_path, File "/opt/lmdeploy/lmdeploy/turbomind/turbomind.py", line 170, in init self.model_comm = self._from_hf(model_source=model_source, File "/opt/lmdeploy/lmdeploy/turbomind/turbomind.py", line 279, in _from_hf output_model = OUTPUT_MODELS.get(output_format)( File "/opt/lmdeploy/lmdeploy/turbomind/deploy/target_model/fp.py", line 26, in init super().init(input_model, cfg, to_file, out_dir) File "/opt/lmdeploy/lmdeploy/turbomind/deploy/target_model/base.py", line 155, in init self.cfg = self.get_config(cfg) File "/opt/lmdeploy/lmdeploy/turbomind/deploy/target_model/fp.py", line 30, in get_config final_cfg = super().get_config(cfg).dict File "/opt/lmdeploy/lmdeploy/turbomind/deploy/target_model/base.py", line 180, in get_config final_cfg.update(dict(head_num=head_num, vocab_size=_vocab_size)) UnboundLocalError: local variable 'head_num' referenced before assignment 使用sudo docker run --runtime nvidia --gpus all \ -v ~/.cache/huggingface:/root/.cache/huggingface \ -v /home/tskj/MOD:/home/MOD/ \ -p 23333:23333 \ --ipc=host \ openmmlab/lmdeploy:v0.4.1 \ lmdeploy serve api_server /home/MOD/internlm-xcomposer2-vl-7b 报以下错误：OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like openai/clip-vit-large-patch14-336 is not the path to a directory containing a file named config.json. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

Reproduction

可能需要修改hugging face上的模型文件

Environment

ubuntu22.04，使用docker 0.4.1官方镜像

Error traceback

No response

internlm/internlm-xcomposer-vl-7b 这个很久以前在以demo的形式支持过，并没有正式支持，所以没办法用serve部署或者使用pipeline的接口。

internlm/internlm-xcomposer2-vl-7b 使用server/pipeline接口支持后，就不打算再支持 internlm/internlm-xcomposer-vl-7b 这个旧的模型了。

使用sudo docker run --runtime nvidia --gpus all
-v ~/.cache/huggingface:/root/.cache/huggingface
-v /home/tskj/MOD:/home/MOD/
-p 23333:23333
--ipc=host
openmmlab/lmdeploy:v0.4.1
lmdeploy serve api_server /home/MOD/internlm-xcomposer2-vl-7b
报以下错误：OSError: We couldn't connect to 'https://huggingface.co/' to load this file, couldn't find it in the cached files and it looks like openai/clip-vit-large-patch14-336 is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

这个报错是网络问题，internlm/internlm-xcomposer2-vl-7b 加载是需要联网的，你如果用xcomposer2 提供的 transformers的代码加载的话还需要下载这个模型。LMDeploy 进行了重写，不需要下载模型，但是需要去下载config信息，你如果要完全离线的话，需要去改LMDeploy 这里的重写代码。将config = CLIPVisionConfig.from_pretrained(vision_tower_name) 这一行联网加载，使用本地加载的方式替代，内容的话使用 openai/clip-vit-large-patch14-336 里面的config.json。

InternLM / lmdeploy