Turingforce commented 6 months ago

Issue Description / 问题描述

运行XAgenGen遇到的问题

RuntimeError: PytorchStreamReader failed reading zip archive: invalid header or archive is corrupted

Steps to Reproduce / 复现步骤

docker run -it -p 13520:13520 --network tool-server-network -v /mnt/XAgentLlama-7B-preview:/model:rw --gpus all --ipc=host xagentteam/xagentgen:latest python app.py --model-path /model --port 13520

Environment / 环境信息

Operating System / 操作系统：Ubuntu 20.04 LTS
Python Version / Python 版本：3.10
Other Relevant Information / 其他相关信息：GPU GeForce 3090 24G, Cuda 11.8, Nvidia driver 535

Error Screenshots or Logs / 错误截图或日志

完整的日志

(xagent) root@ubuntu20:~/XAgent# docker run -it -p 13520:13520 --network tool-server-network -v /mnt/XAgentLlama-7B-preview:/model:rw --gpus all --ipc=host xagentteam/xagentgen:latest python app.py --model-path /model --port 13520

========== == CUDA ==

CUDA Version 11.8.0

This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

INFO 12-07 08:34:43 llm_engine.py:72] Initializing an LLM engine with config: model='/model', tokenizer='/model', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=16384, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=None, seed=42) Traceback (most recent call last): File "/app/app.py", line 58, in engine = AsyncLLMEngine.from_engine_args(engine_configs) File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 486, in from_engine_args engine = cls(parallel_config.worker_use_ray, File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 269, in init self.engine = self._init_engine(*args, kwargs) File "/opt/conda/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 305, in _init_engine return engine_class(*args, *kwargs) File "/opt/conda/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 110, in init self._init_workers(distributed_init_method) File "/opt/conda/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 142, in _init_workers self._run_workers( File "/opt/conda/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 700, in _run_workers output = executor(args, kwargs) File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py", line 70, in init_model self.model = get_model(self.model_config) File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/model_loader.py", line 98, in get_model model.load_weights(model_config.model, model_config.download_dir, File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/llama.py", line 315, in load_weights for name, loaded_weight in hf_model_weights_iterator( File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/weight_utils.py", line 250, in hf_model_weights_iterator state = torch.load(bin_file, map_location="cpu") File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 993, in load with _open_zipfile_reader(opened_file) as opened_zipfile: File "/opt/conda/lib/python3.10/site-packages/torch/serialization.py", line 447, in init super().init(torch._C.PyTorchFileReader(name_or_buffer)) RuntimeError: PytorchStreamReader failed reading zip archive: invalid header or archive is corrupted

Cppowboy commented 6 months ago

Please make sure the checkpoint has been downloaded successfully.

Turingforce commented 6 months ago

Please make sure the checkpoint has been downloaded successfully.

OK, I will try

Turingforce commented 6 months ago

Please make sure the checkpoint has been downloaded successfully.

I finally get it done, I suffered a lot by the network condition of my workplace. Thank you.

OpenBMB / XAgent

运行XAgenGen遇到的问题：RuntimeError: PytorchStreamReader failed reading zip archive: invalid header or archive is corrupted #321

Issue Description / 问题描述

Steps to Reproduce / 复现步骤

Environment / 环境信息

Error Screenshots or Logs / 错误截图或日志

完整的日志

========== == CUDA ==