Open Vanessa-Taing opened 2 months ago
Hi, can you try updating your vllm
version to 0.5.4?
Thanks for the reply, I have tried to update the vllm version by running
pip install vllm==0.5.4
And here's the environment packages, updated:
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
aiohappyeyeballs 2.4.0 pypi_0 pypi
aiohttp 3.10.5 pypi_0 pypi
aiosignal 1.3.1 pypi_0 pypi
annotated-types 0.7.0 pypi_0 pypi
anyio 4.4.0 pypi_0 pypi
attrs 24.2.0 pypi_0 pypi
audioread 3.0.1 pypi_0 pypi
bzip2 1.0.8 h5eee18b_6
ca-certificates 2024.7.2 h06a4308_0
certifi 2024.8.30 pypi_0 pypi
cffi 1.17.0 pypi_0 pypi
charset-normalizer 3.3.2 pypi_0 pypi
click 8.1.7 pypi_0 pypi
cloudpickle 3.0.0 pypi_0 pypi
cmake 3.30.2 pypi_0 pypi
datasets 2.21.0 pypi_0 pypi
decorator 5.1.1 pypi_0 pypi
dill 0.3.8 pypi_0 pypi
diskcache 5.6.3 pypi_0 pypi
distro 1.9.0 pypi_0 pypi
expat 2.6.2 h6a678d5_0
fastapi 0.112.2 pypi_0 pypi
filelock 3.15.4 pypi_0 pypi
frozenlist 1.4.1 pypi_0 pypi
fsspec 2024.6.1 pypi_0 pypi
gguf 0.9.1 pypi_0 pypi
h11 0.14.0 pypi_0 pypi
httpcore 1.0.5 pypi_0 pypi
httptools 0.6.1 pypi_0 pypi
httpx 0.27.2 pypi_0 pypi
huggingface-hub 0.24.6 pypi_0 pypi
idna 3.8 pypi_0 pypi
importlib-metadata 8.4.0 pypi_0 pypi
interegular 0.3.3 pypi_0 pypi
jinja2 3.1.4 pypi_0 pypi
jiter 0.5.0 pypi_0 pypi
joblib 1.4.2 pypi_0 pypi
jsonschema 4.23.0 pypi_0 pypi
jsonschema-specifications 2023.12.1 pypi_0 pypi
lark 1.2.2 pypi_0 pypi
lazy-loader 0.4 pypi_0 pypi
ld_impl_linux-64 2.38 h1181459_1
libffi 3.4.4 h6a678d5_1
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
librosa 0.10.2.post1 pypi_0 pypi
libstdcxx-ng 11.2.0 h1234567_1
libuuid 1.41.5 h5eee18b_0
llvmlite 0.43.0 pypi_0 pypi
lm-format-enforcer 0.10.3 pypi_0 pypi
markupsafe 2.1.5 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
msgpack 1.0.8 pypi_0 pypi
msgspec 0.18.6 pypi_0 pypi
multidict 6.0.5 pypi_0 pypi
multiprocess 0.70.16 pypi_0 pypi
ncurses 6.4 h6a678d5_0
nest-asyncio 1.6.0 pypi_0 pypi
networkx 3.3 pypi_0 pypi
ninja 1.11.1.1 pypi_0 pypi
numba 0.60.0 pypi_0 pypi
numpy 1.26.4 pypi_0 pypi
nvidia-cublas-cu12 12.1.3.1 pypi_0 pypi
nvidia-cuda-cupti-cu12 12.1.105 pypi_0 pypi
nvidia-cuda-nvrtc-cu12 12.1.105 pypi_0 pypi
nvidia-cuda-runtime-cu12 12.1.105 pypi_0 pypi
nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi
nvidia-cufft-cu12 11.0.2.54 pypi_0 pypi
nvidia-curand-cu12 10.3.2.106 pypi_0 pypi
nvidia-cusolver-cu12 11.4.5.107 pypi_0 pypi
nvidia-cusparse-cu12 12.1.0.106 pypi_0 pypi
nvidia-ml-py 12.560.30 pypi_0 pypi
nvidia-nccl-cu12 2.20.5 pypi_0 pypi
nvidia-nvjitlink-cu12 12.6.68 pypi_0 pypi
nvidia-nvtx-cu12 12.1.105 pypi_0 pypi
openai 1.43.0 pypi_0 pypi
openssl 3.0.14 h5eee18b_0
outlines 0.0.46 pypi_0 pypi
packaging 24.1 pypi_0 pypi
pandas 2.2.2 pypi_0 pypi
pillow 10.4.0 pypi_0 pypi
pip 24.2 py312h06a4308_0
platformdirs 4.2.2 pypi_0 pypi
pooch 1.8.2 pypi_0 pypi
prometheus-client 0.20.0 pypi_0 pypi
prometheus-fastapi-instrumentator 7.0.0 pypi_0 pypi
protobuf 5.28.0 pypi_0 pypi
psutil 6.0.0 pypi_0 pypi
py-cpuinfo 9.0.0 pypi_0 pypi
pyairports 2.1.1 pypi_0 pypi
pyarrow 17.0.0 pypi_0 pypi
pycountry 24.6.1 pypi_0 pypi
pycparser 2.22 pypi_0 pypi
pydantic 2.8.2 pypi_0 pypi
pydantic-core 2.20.1 pypi_0 pypi
python 3.12.4 h5148396_1
python-dateutil 2.9.0.post0 pypi_0 pypi
python-dotenv 1.0.1 pypi_0 pypi
pytz 2024.1 pypi_0 pypi
pyyaml 6.0.2 pypi_0 pypi
pyzmq 26.2.0 pypi_0 pypi
ray 2.35.0 pypi_0 pypi
readline 8.2 h5eee18b_0
referencing 0.35.1 pypi_0 pypi
regex 2024.7.24 pypi_0 pypi
requests 2.32.3 pypi_0 pypi
rpds-py 0.20.0 pypi_0 pypi
safetensors 0.4.4 pypi_0 pypi
scikit-learn 1.5.1 pypi_0 pypi
scipy 1.14.1 pypi_0 pypi
sentencepiece 0.2.0 pypi_0 pypi
setuptools 72.1.0 py312h06a4308_0
six 1.16.0 pypi_0 pypi
sniffio 1.3.1 pypi_0 pypi
soundfile 0.12.1 pypi_0 pypi
soxr 0.5.0 pypi_0 pypi
sqlite 3.45.3 h5eee18b_0
starlette 0.38.2 pypi_0 pypi
sympy 1.13.2 pypi_0 pypi
threadpoolctl 3.5.0 pypi_0 pypi
tiktoken 0.7.0 pypi_0 pypi
tk 8.6.14 h39e8969_0
tokenizers 0.19.1 pypi_0 pypi
torch 2.4.0 pypi_0 pypi
torchvision 0.19.0 pypi_0 pypi
tqdm 4.66.5 pypi_0 pypi
transformers 4.44.2 pypi_0 pypi
triton 3.0.0 pypi_0 pypi
typing-extensions 4.12.2 pypi_0 pypi
tzdata 2024.1 pypi_0 pypi
urllib3 2.2.2 pypi_0 pypi
uvicorn 0.30.6 pypi_0 pypi
uvloop 0.20.0 pypi_0 pypi
vllm 0.5.4 pypi_0 pypi
vllm-flash-attn 2.6.1 pypi_0 pypi
watchfiles 0.24.0 pypi_0 pypi
websockets 13.0.1 pypi_0 pypi
wheel 0.43.0 py312h06a4308_0
xformers 0.0.27.post2 pypi_0 pypi
xxhash 3.5.0 pypi_0 pypi
xz 5.4.6 h5eee18b_1
yarl 1.9.4 pypi_0 pypi
zipp 3.20.1 pypi_0 pypi
zlib 1.2.13 h5eee18b_1
However, the same issue occured:
INFO 09-04 16:09:15 llm_engine.py:174] Initializing an LLM engine (v0.5.4) with config: model='THUDM/LongWriter-glm4-9b', speculative_config=None, tokenizer='THUDM/LongWriter-glm4-9b', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=THUDM/LongWriter-glm4-9b, use_v2_block_manager=False, enable_prefix_caching=False)
WARNING 09-04 16:09:16 tokenizer.py:129] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
WARNING 09-04 16:09:16 utils.py:578] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
INFO 09-04 16:09:17 model_runner.py:720] Starting to load model THUDM/LongWriter-glm4-9b...
INFO 09-04 16:09:18 weight_utils.py:225] Using model weights format ['*.safetensors']
[rank0]: Traceback (most recent call last):
[rank0]: File "/mnt/c/Users/CSOC/Documents/longwriter/lw-vllm.py", line 2, in <module>
[rank0]: model = LLM(
[rank0]: ^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 158, in __init__
[rank0]: self.llm_engine = LLMEngine.from_engine_args(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 445, in from_engine_args
[rank0]: engine = cls(
[rank0]: ^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 249, in __init__
[rank0]: self.model_executor = executor_class(
[rank0]: ^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 47, in __init__
[rank0]: self._init_executor()
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/executor/gpu_executor.py", line 36, in _init_executor
[rank0]: self.driver_worker.load_model()
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/worker/worker.py", line 139, in load_model
[rank0]: self.model_runner.load_model()
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/worker/model_runner.py", line 722, in load_model
[rank0]: self.model = get_model(model_config=self.model_config,
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/model_executor/model_loader/__init__.py", line 21, in get_model
[rank0]: return loader.load_model(model_config=model_config,
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py", line 328, in load_model
[rank0]: self._get_weights_iterator(model_config.model,
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py", line 288, in _get_weights_iterator
[rank0]: hf_folder, hf_weights_files, use_safetensors = self._prepare_weights(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py", line 271, in _prepare_weights
[rank0]: hf_weights_files = filter_duplicate_safetensors_files(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/site-packages/vllm/model_executor/model_loader/weight_utils.py", line 290, in filter_duplicate_safetensors_files
[rank0]: weight_map = json.load(index_file)["weight_map"]
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/json/__init__.py", line 293, in load
[rank0]: return loads(fp.read(),
[rank0]: ^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/json/__init__.py", line 346, in loads
[rank0]: return _default_decoder.decode(s)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/json/decoder.py", line 337, in decode
[rank0]: obj, end = self.raw_decode(s, idx=_w(s, 0).end())
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/project/anaconda3/envs/long/lib/python3.12/json/decoder.py", line 355, in raw_decode
[rank0]: raise JSONDecodeError("Expecting value", s, err.value) from None
[rank0]: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
你好,是不是你的模型没有下载成功?可以试试先把模型下载到本地然后用本地路径载入vllm。
System Info / 系統信息
CUDA Version: 12.2 transformers Version: 4.44.2 Python: 3.12.4 Operating system: Windows Subsystem for Linux (WSL) in VS Code
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
Reproduction / 复现过程
Expected behavior / 期待表现
Article generated, just like when running with HuggingFace transformer.