vllm 部署 GLM-4V报错KeyError: 'transformer.vision.transformer.layers.45.mlp.fc2.weight'

System Info / 系統信息

版本信息

pip list
Package                           Version
--------------------------------- ------------
accelerate                        0.30.0
addict                            2.4.0
aiohttp                           3.9.5
aiosignal                         1.3.1
aliyun-python-sdk-core            2.15.1
aliyun-python-sdk-kms             2.16.2
annotated-types                   0.6.0
anyio                             4.3.0
async-timeout                     4.0.3
attrs                             23.2.0
certifi                           2024.2.2
cffi                              1.16.0
charset-normalizer                3.3.2
click                             8.1.7
cloudpickle                       3.0.0
cmake                             3.29.2
crcmod                            1.7
cryptography                      42.0.6
datasets                          2.18.0
dill                              0.3.8
diskcache                         5.6.3
distro                            1.9.0
dnspython                         2.6.1
einops                            0.8.0
email_validator                   2.1.1
exceptiongroup                    1.2.1
fastapi                           0.111.0
fastapi-cli                       0.0.2
filelock                          3.14.0
flash-attn                        2.5.8
frozenlist                        1.4.1
fsspec                            2024.2.0
gast                              0.5.4
h11                               0.14.0
hf_transfer                       0.1.6
httpcore                          1.0.5
httptools                         0.6.1
httpx                             0.27.0
huggingface-hub                   0.23.0
idna                              3.7
importlib_metadata                7.1.0
interegular                       0.3.3
Jinja2                            3.1.3
jmespath                          0.10.0
joblib                            1.4.2
jsonschema                        4.22.0
jsonschema-specifications         2023.12.1
lark                              1.1.9
llvmlite                          0.42.0
lm-format-enforcer                0.10.1
markdown-it-py                    3.0.0
MarkupSafe                        2.1.5
mdurl                             0.1.2
modelscope                        1.14.0
mpmath                            1.3.0
msgpack                           1.0.8
multidict                         6.0.5
multiprocess                      0.70.16
nest-asyncio                      1.6.0
networkx                          3.3
ninja                             1.11.1.1
numba                             0.59.1
numpy                             1.26.4
nvidia-cublas-cu12                12.1.3.1
nvidia-cuda-cupti-cu12            12.1.105
nvidia-cuda-nvrtc-cu12            12.1.105
nvidia-cuda-runtime-cu12          12.1.105
nvidia-cudnn-cu12                 8.9.2.26
nvidia-cufft-cu12                 11.0.2.54
nvidia-curand-cu12                10.3.2.106
nvidia-cusolver-cu12              11.4.5.107
nvidia-cusparse-cu12              12.1.0.106
nvidia-ml-py                      12.550.52
nvidia-nccl-cu12                  2.20.5
nvidia-nvjitlink-cu12             12.4.127
nvidia-nvtx-cu12                  12.1.105
openai                            1.25.1
orjson                            3.10.3
oss2                              2.18.5
outlines                          0.0.34
packaging                         24.0
pandas                            2.2.2
pillow                            10.3.0
pip                               22.0.2
platformdirs                      4.2.1
prometheus_client                 0.20.0
prometheus-fastapi-instrumentator 7.0.0
protobuf                          5.26.1
psutil                            5.9.8
py-cpuinfo                        9.0.0
pyarrow                           16.0.0
pyarrow-hotfix                    0.6
pycparser                         2.22
pycryptodome                      3.20.0
pydantic                          2.7.1
pydantic_core                     2.18.2
Pygments                          2.18.0
python-dateutil                   2.9.0.post0
python-dotenv                     1.0.1
python-multipart                  0.0.9
pytz                              2024.1
PyYAML                            6.0.1
ray                               2.20.0
referencing                       0.35.1
regex                             2024.4.28
requests                          2.31.0
rich                              13.7.1
rpds-py                           0.18.0
safetensors                       0.4.3
scipy                             1.13.0
sentencepiece                     0.2.0
setuptools                        59.6.0
shellingham                       1.5.4
simplejson                        3.19.2
six                               1.16.0
sniffio                           1.3.1
sortedcontainers                  2.4.0
starlette                         0.37.2
sympy                             1.12
tiktoken                          0.6.0
tokenizers                        0.19.1
tomli                             2.0.1
torch                             2.3.0
torchvision                       0.18.1
tqdm                              4.66.4
transformers                      4.40.0
triton                            2.3.0
typer                             0.12.3
typing_extensions                 4.11.0
tzdata                            2024.1
ujson                             5.9.0
urllib3                           2.2.1
uvicorn                           0.29.0
uvloop                            0.19.0
vllm                              0.4.3
vllm-flash-attn                   2.5.8.post2
vllm-nccl-cu12                    2.18.1.0.4.0
watchfiles                        0.21.0
websockets                        12.0
wheel                             0.37.1
xformers                          0.0.26.post1
xxhash                            3.4.1
yapf                              0.40.2
yarl                              1.9.4

cuda/硬件信息

nvidia-smi
---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-80GB          Off | 00000000:07:00.0 Off |                    0 |
| N/A   30C    P0              62W / 400W |      7MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM4-80GB          Off | 00000000:0A:00.0 Off |                    0 |
| N/A   28C    P0              66W / 400W |  75929MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-SXM4-80GB          Off | 00000000:47:00.0 Off |                    0 |
| N/A   28C    P0              64W / 400W |  58159MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-SXM4-80GB          Off | 00000000:4D:00.0 Off |                    0 |
| N/A   29C    P0              61W / 400W |      7MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA A100-SXM4-80GB          Off | 00000000:87:00.0 Off |                    0 |
| N/A   30C    P0              61W / 400W |      7MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA A100-SXM4-80GB          Off | 00000000:8D:00.0 Off |                    0 |
| N/A   27C    P0              63W / 400W |      9MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA A100-SXM4-80GB          Off | 00000000:C7:00.0 Off |                    0 |
| N/A   27C    P0              59W / 400W |      7MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA A100-SXM4-80GB          Off | 00000000:CA:00.0 Off |                    0 |
| N/A   29C    P0              63W / 400W |      7MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

Reproduction / 复现过程

使用vllm启动CLM4-V时执行的命令

CUDA_VISIBLE_DEVICES=0,3 python3 -m vllm.entrypoints.openai.api_server    --model=/data/lush-dev/liwei/code/gpt/models/huggingface/glm-4v-9b    --served-model-name=glm-4v-9b    --device=cuda    --port=8000    --host=0.0.0.0    --tensor-parallel-size=1    --dtype=auto    --trust-remote-code

2, 报错信息

INFO 06-11 08:11:33 llm_engine.py:161] Initializing an LLM engine (v0.4.3) with config: model='/data/lush-dev/liwei/code/gpt/models/huggingface/glm-4v-9b', speculative_config=None, tokenizer='/data/lush-dev/liwei/code/gpt/models/huggingface/glm-4v-9b', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=glm-4v-9b)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
WARNING 06-11 08:11:34 tokenizer.py:126] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
[rank0]:     return _run_code(code, main_globals, None,
[rank0]:   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
[rank0]:     exec(code, run_globals)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 186, in <module>
[rank0]:     engine = AsyncLLMEngine.from_engine_args(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 386, in from_engine_args
[rank0]:     engine = cls(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 340, in __init__
[rank0]:     self.engine = self._init_engine(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 462, in _init_engine
[rank0]:     return engine_class(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 222, in __init__
[rank0]:     self.model_executor = executor_class(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 41, in __init__
[rank0]:     self._init_executor()
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 24, in _init_executor
[rank0]:     self.driver_worker.load_model()
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 121, in load_model
[rank0]:     self.model_runner.load_model()
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 134, in load_model
[rank0]:     self.model = get_model(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/__init__.py", line 21, in get_model
[rank0]:     return loader.load_model(model_config=model_config,
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/loader.py", line 243, in load_model
[rank0]:     model.load_weights(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/chatglm.py", line 392, in load_weights
[rank0]:     param = params_dict[name]
[rank0]: KeyError: 'transformer.vision.transformer.layers.45.mlp.fc2.weight'

Expected behavior / 期待表现

希望能使用vllm正常推理GLM4-V,目前使用同样命令推理glm-4-9b-chat正常

THUDM / GLM-4

vllm 部署 GLM-4V报错KeyError: 'transformer.vision.transformer.layers.45.mlp.fc2.weight' #139

System Info / 系統信息

Reproduction / 复现过程

Expected behavior / 期待表现