HabanaAI / vllm-fork

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
38 stars 47 forks source link

[Bug]: Engine loop has died #419

Open warlock135 opened 13 hours ago

warlock135 commented 13 hours ago

Your current environment

The output of `python collect_env.py` ``` NUMA node0 CPU(s): 0-39,80-119 NUMA node1 CPU(s): 40-79,120-159 Vulnerability Gather data sampling: Mitigation; Microcode Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Versions of relevant libraries: [pip3] habana-torch-dataloader==1.18.0.524 [pip3] habana-torch-plugin==1.18.0.524 [pip3] numpy==1.26.4 [pip3] pynvml==8.0.4 [pip3] pytorch-lightning==2.4.0 [pip3] pyzmq==26.2.0 [pip3] torch==2.4.0a0+git74cd574 [pip3] torch_tb_profiler==0.4.0 [pip3] torchaudio==2.4.0a0+69d4077 [pip3] torchdata==0.7.1+5e6f7b7 [pip3] torchmetrics==1.4.2 [pip3] torchtext==0.18.0a0+9bed85d [pip3] torchvision==0.19.0a0+48b1edf [pip3] transformers==4.45.2 [pip3] triton==3.1.0 [conda] Could not collect ROCM Version: Could not collect Neuron SDK Version: N/A vLLM Version: 0.6.3.dev554+g07c98a52 vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled GPU Topology: Could not collect ```

Model Input Dumps

No response

🐛 Describe the bug

When inferencing with vllm (using openAI api server), I got the error below:

ERROR 10-23 09:12:38 client.py:250] RuntimeError('Engine loop has died')^M
ERROR 10-23 09:12:38 client.py:250] Traceback (most recent call last):^M
ERROR 10-23 09:12:38 client.py:250]   File "/vllm-fork/vllm/engine/multiprocessing/client.py", line 150, in run_heartbeat_loop^M
ERROR 10-23 09:12:38 client.py:250]     await self._check_success(^M
ERROR 10-23 09:12:38 client.py:250]   File "/vllm-fork/vllm/engine/multiprocessing/client.py", line 314, in _check_success^M
ERROR 10-23 09:12:38 client.py:250]     raise response^M
ERROR 10-23 09:12:38 client.py:250] RuntimeError: Engine loop has died

After this, all current request were released and vllm crash when another request came:

CRITICAL 10-23 09:53:06 launcher.py:99] MQLLMEngine is already dead, terminating server process
INFO:     127.0.0.1:44188 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error

I started vllm with the following command (inside docker container with image vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest)

PT_HPU_LAZY_MODE=1 PT_HPU_ENABLE_LAZY_COLLECTIVES=true python3 -m vllm.entrypoints.openai.api_server --model Meta-Llama-3-70B-Instruct --port 9002 --gpu-memory-utilization 0.94 --tensor-parallel-size 8 --disable-log-requests --block-size 128

Before submitting a new issue...

michalkuligowski commented 12 hours ago

Hi @warlock135, which version of HabanaAI vllm-fork are you using, 1.18.0 or habana_main?

warlock135 commented 12 hours ago

Hi @warlock135, which version of HabanaAI vllm-fork are you using, 1.18.0 or habana_main?

I'm using the habana_main version

michalkuligowski commented 11 hours ago

@warlock135 we will try to find what the issue is. In the meantime, please try using v1.18.0 branch (tag v0.5.3.post1+Gaudi-1.18.0), and see if the issue is still present.