Closed 2500035435 closed 1 month ago
CUDA Version: 12.2 transformers Version: 4.43.3 python Version 3.11.9 torch Version 2.3.1 GLIBC 2.31 Ubuntu 20.04 LTS
No response
CUDA_VISIABLE_DEVICES=0 python /root/ljm/ChatGLM4/GLM-4/basic_demo/openai_api_server.py
WARNING 08-07 22:47:39 _custom_ops.py:14] Failed to import from vllm._C with ImportError("/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /root/anaconda3/envs/ljm_glm4_conda/lib/python3.11/site-packages/vllm/_C.abi3.so)") INFO 08-07 22:47:41 config.py:695] Defaulting to use mp for distributed inference INFO 08-07 22:47:41 llm_engine.py:174] Initializing an LLM engine (v0.5.2) with config: model='/root/ljm/models/glm-4v-9b', speculative_config=None, tokenizer='/root/ljm/models/glm-4v-9b', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=/root/ljm/models/glm-4v-9b, use_v2_block_manager=False, enable_prefix_caching=False) WARNING 08-07 22:47:41 tokenizer.py:126] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead. INFO 08-07 22:47:41 custom_cache_manager.py:17] Setting Triton cache manager to: vllm.triton_utils.custom_cache_manager:CustomCacheManager (VllmWorkerProcess pid=2189773) INFO 08-07 22:47:41 multiproc_worker_utils.py:215] Worker ready; awaiting tasks (VllmWorkerProcess pid=2189773) INFO 08-07 22:47:42 utils.py:737] Found nccl from library libnccl.so.2 INFO 08-07 22:47:42 utils.py:737] Found nccl from library libnccl.so.2 (VllmWorkerProcess pid=2189773) INFO 08-07 22:47:42 pynccl.py:63] vLLM is using nccl==2.20.5 INFO 08-07 22:47:42 pynccl.py:63] vLLM is using nccl==2.20.5 (VllmWorkerProcess pid=2189773) ERROR 08-07 22:47:43 multiproc_worker_utils.py:226] Exception in worker VllmWorkerProcess while processing method load_model: 'transformer.vision.transformer.layers.45.mlp.fc2.weight', Traceback (most recent call last): (VllmWorkerProcess pid=2189773) ERROR 08-07 22:47:43 multiproc_worker_utils.py:226] File "/root/anaconda3/envs/ljm_glm4_conda/lib/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process (VllmWorkerProcess pid=2189773) ERROR 08-07 22:47:43 multiproc_worker_utils.py:226] output = executor(*args, **kwargs) (VllmWorkerProcess pid=2189773) ERROR 08-07 22:47:43 multiproc_worker_utils.py:226] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=2189773) ERROR 08-07 22:47:43 multiproc_worker_utils.py:226] File "/root/anaconda3/envs/ljm_glm4_conda/lib/python3.11/site-packages/vllm/worker/worker.py", line 139, in load_model (VllmWorkerProcess pid=2189773) ERROR 08-07 22:47:43 multiproc_worker_utils.py:226] self.model_runner.load_model() (VllmWorkerProcess pid=2189773) ERROR 08-07 22:47:43 multiproc_worker_utils.py:226] File "/root/anaconda3/envs/ljm_glm4_conda/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 256, in load_model (VllmWorkerProcess pid=2189773) ERROR 08-07 22:47:43 multiproc_worker_utils.py:226] self.model = get_model(model_config=self.model_config, (VllmWorkerProcess pid=2189773) ERROR 08-07 22:47:43 multiproc_worker_utils.py:226] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=2189773) ERROR 08-07 22:47:43 multiproc_worker_utils.py:226] File "/root/anaconda3/envs/ljm_glm4_conda/lib/python3.11/site-packages/vllm/model_executor/model_loader/__init__.py", line 21, in get_model (VllmWorkerProcess pid=2189773) ERROR 08-07 22:47:43 multiproc_worker_utils.py:226] return loader.load_model(model_config=model_config, (VllmWorkerProcess pid=2189773) ERROR 08-07 22:47:43 multiproc_worker_utils.py:226] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=2189773) ERROR 08-07 22:47:43 multiproc_worker_utils.py:226] File "/root/anaconda3/envs/ljm_glm4_conda/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 270, in load_model (VllmWorkerProcess pid=2189773) ERROR 08-07 22:47:43 multiproc_worker_utils.py:226] model.load_weights( (VllmWorkerProcess pid=2189773) ERROR 08-07 22:47:43 multiproc_worker_utils.py:226] File "/root/anaconda3/envs/ljm_glm4_conda/lib/python3.11/site-packages/vllm/model_executor/models/chatglm.py", line 399, in load_weights (VllmWorkerProcess pid=2189773) ERROR 08-07 22:47:43 multiproc_worker_utils.py:226] param = params_dict[name] (VllmWorkerProcess pid=2189773) ERROR 08-07 22:47:43 multiproc_worker_utils.py:226] ~~~~~~~~~~~^^^^^^ (VllmWorkerProcess pid=2189773) ERROR 08-07 22:47:43 multiproc_worker_utils.py:226] KeyError: 'transformer.vision.transformer.layers.45.mlp.fc2.weight' (VllmWorkerProcess pid=2189773) ERROR 08-07 22:47:43 multiproc_worker_utils.py:226] [rank0]: Traceback (most recent call last): [rank0]: File "/root/ljm/ChatGLM4/GLM-4/basic_demo/openai_api_server.py", line 681, in <module> [rank0]: engine = AsyncLLMEngine.from_engine_args(engine_args) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/root/anaconda3/envs/ljm_glm4_conda/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 444, in from_engine_args [rank0]: engine = cls( [rank0]: ^^^^ [rank0]: File "/root/anaconda3/envs/ljm_glm4_conda/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 373, in __init__ [rank0]: self.engine = self._init_engine(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/root/anaconda3/envs/ljm_glm4_conda/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 520, in _init_engine [rank0]: return engine_class(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/root/anaconda3/envs/ljm_glm4_conda/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 249, in __init__ [rank0]: self.model_executor = executor_class( [rank0]: ^^^^^^^^^^^^^^^ [rank0]: File "/root/anaconda3/envs/ljm_glm4_conda/lib/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 158, in __init__ [rank0]: super().__init__(*args, **kwargs) [rank0]: File "/root/anaconda3/envs/ljm_glm4_conda/lib/python3.11/site-packages/vllm/executor/distributed_gpu_executor.py", line 25, in __init__ [rank0]: super().__init__(*args, **kwargs) [rank0]: File "/root/anaconda3/envs/ljm_glm4_conda/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 150, in __init__ [rank0]: super().__init__(model_config, cache_config, parallel_config, [rank0]: File "/root/anaconda3/envs/ljm_glm4_conda/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 46, in __init__ [rank0]: self._init_executor() [rank0]: File "/root/anaconda3/envs/ljm_glm4_conda/lib/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 84, in _init_executor [rank0]: self._run_workers("load_model", [rank0]: File "/root/anaconda3/envs/ljm_glm4_conda/lib/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 135, in _run_workers [rank0]: driver_worker_output = driver_worker_method(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/root/anaconda3/envs/ljm_glm4_conda/lib/python3.11/site-packages/vllm/worker/worker.py", line 139, in load_model [rank0]: self.model_runner.load_model() [rank0]: File "/root/anaconda3/envs/ljm_glm4_conda/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 256, in load_model [rank0]: self.model = get_model(model_config=self.model_config, [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/root/anaconda3/envs/ljm_glm4_conda/lib/python3.11/site-packages/vllm/model_executor/model_loader/__init__.py", line 21, in get_model [rank0]: return loader.load_model(model_config=model_config, [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/root/anaconda3/envs/ljm_glm4_conda/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 270, in load_model [rank0]: model.load_weights( [rank0]: File "/root/anaconda3/envs/ljm_glm4_conda/lib/python3.11/site-packages/vllm/model_executor/models/chatglm.py", line 399, in load_weights [rank0]: param = params_dict[name] [rank0]: ~~~~~~~~~~~^^^^^^ [rank0]: KeyError: 'transformer.vision.transformer.layers.45.mlp.fc2.weight' INFO 08-07 22:47:44 multiproc_worker_utils.py:123] Killing local vLLM worker processes Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name='<stdout>'> at interpreter shutdown, possibly due to daemon threads Python runtime state: finalizing (tstate=0x00000000008a7a38) Current thread 0x00007f2ba9ee64c0 (most recent call first): <no Python frame> Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, yaml._yaml, psutil._psutil_linux, psutil._psutil_posix, sentencepiece._sentencepiece, msgpack._cmsgpack, google._upb._message, setproctitle, uvloop.loop, ray._raylet, ujson, PIL._imaging, PIL._imagingft, regex._regex, zmq.backend.cython.context, zmq.backend.cython.message, zmq.backend.cython.socket, zmq.backend.cython._device, zmq.backend.cython._poll, zmq.backend.cython._proxy_steerable, zmq.backend.cython._version, zmq.backend.cython.error, zmq.backend.cython.utils (total: 45) /root/anaconda3/envs/ljm_glm4_conda/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' Aborted (core dumped)
请问一下,报错是否GLIBC版本所导致?本机环境GLIBC版本是2.31,是否必须更新版本;除此之外,还有其他因素导致报错吗?
这错误应该是驱动的问题吧,好像没有看py错误,或许应该升级试试,python依赖按照req就行
感谢回答
System Info / 系統信息
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
Reproduction / 复现过程
CUDA_VISIABLE_DEVICES=0 python /root/ljm/ChatGLM4/GLM-4/basic_demo/openai_api_server.py
Expected behavior / 期待表现
请问一下,报错是否GLIBC版本所导致?本机环境GLIBC版本是2.31,是否必须更新版本;除此之外,还有其他因素导致报错吗?