Closed Turingforce closed 6 months ago
Please make sure the checkpoint has been downloaded successfully.
Please make sure the checkpoint has been downloaded successfully.
OK, I will try
Please make sure the checkpoint has been downloaded successfully.
I finally get it done, I suffered a lot by the network condition of my workplace. Thank you.
Issue Description / 问题描述
运行XAgenGen遇到的问题
RuntimeError: PytorchStreamReader failed reading zip archive: invalid header or archive is corrupted
Steps to Reproduce / 复现步骤
Environment / 环境信息
Error Screenshots or Logs / 错误截图或日志
完整的日志
(xagent) root@ubuntu20:~/XAgent# docker run -it -p 13520:13520 --network tool-server-network -v /mnt/XAgentLlama-7B-preview:/model:rw --gpus all --ipc=host xagentteam/xagentgen:latest python app.py --model-path /model --port 13520
========== == CUDA ==
CUDA Version 11.8.0
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
INFO 12-07 08:34:43 llm_engine.py:72] Initializing an LLM engine with config: model='/model', tokenizer='/model', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=16384, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=None, seed=42) Traceback (most recent call last): File "/app/app.py", line 58, in
engine = AsyncLLMEngine.from_engine_args(engine_configs)
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 486, in from_engine_args
engine = cls(parallel_config.worker_use_ray,
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 269, in init
self.engine = self._init_engine(*args, kwargs)
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 305, in _init_engine
return engine_class(*args, *kwargs)
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 110, in init
self._init_workers(distributed_init_method)
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 142, in _init_workers
self._run_workers(
File "/opt/conda/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 700, in _run_workers
output = executor(args, kwargs)
File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py", line 70, in init_model
self.model = get_model(self.model_config)
File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/model_loader.py", line 98, in get_model
model.load_weights(model_config.model, model_config.download_dir,
File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/llama.py", line 315, in load_weights
for name, loaded_weight in hf_model_weights_iterator(
File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/weight_utils.py", line 250, in hf_model_weights_iterator
state = torch.load(bin_file, map_location="cpu")
File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 993, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 447, in init
super().init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: invalid header or archive is corrupted