[Bug]: Engine loop has died

warlock135 commented 13 hours ago

Your current environment

The output of `python collect_env.py`

``` NUMA node0 CPU(s): 0-39,80-119 NUMA node1 CPU(s): 40-79,120-159 Vulnerability Gather data sampling: Mitigation; Microcode Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Versions of relevant libraries: [pip3] habana-torch-dataloader==1.18.0.524 [pip3] habana-torch-plugin==1.18.0.524 [pip3] numpy==1.26.4 [pip3] pynvml==8.0.4 [pip3] pytorch-lightning==2.4.0 [pip3] pyzmq==26.2.0 [pip3] torch==2.4.0a0+git74cd574 [pip3] torch_tb_profiler==0.4.0 [pip3] torchaudio==2.4.0a0+69d4077 [pip3] torchdata==0.7.1+5e6f7b7 [pip3] torchmetrics==1.4.2 [pip3] torchtext==0.18.0a0+9bed85d [pip3] torchvision==0.19.0a0+48b1edf [pip3] transformers==4.45.2 [pip3] triton==3.1.0 [conda] Could not collect ROCM Version: Could not collect Neuron SDK Version: N/A vLLM Version: 0.6.3.dev554+g07c98a52 vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled GPU Topology: Could not collect ```

Model Input Dumps

No response

🐛 Describe the bug

When inferencing with vllm (using openAI api server), I got the error below:

ERROR 10-23 09:12:38 client.py:250] RuntimeError('Engine loop has died')^M
ERROR 10-23 09:12:38 client.py:250] Traceback (most recent call last):^M
ERROR 10-23 09:12:38 client.py:250]   File "/vllm-fork/vllm/engine/multiprocessing/client.py", line 150, in run_heartbeat_loop^M
ERROR 10-23 09:12:38 client.py:250]     await self._check_success(^M
ERROR 10-23 09:12:38 client.py:250]   File "/vllm-fork/vllm/engine/multiprocessing/client.py", line 314, in _check_success^M
ERROR 10-23 09:12:38 client.py:250]     raise response^M
ERROR 10-23 09:12:38 client.py:250] RuntimeError: Engine loop has died

After this, all current request were released and vllm crash when another request came:

CRITICAL 10-23 09:53:06 launcher.py:99] MQLLMEngine is already dead, terminating server process
INFO:     127.0.0.1:44188 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error

I started vllm with the following command (inside docker container with image vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest)

PT_HPU_LAZY_MODE=1 PT_HPU_ENABLE_LAZY_COLLECTIVES=true python3 -m vllm.entrypoints.openai.api_server --model Meta-Llama-3-70B-Instruct --port 9002 --gpu-memory-utilization 0.94 --tensor-parallel-size 8 --disable-log-requests --block-size 128

Before submitting a new issue...

[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

michalkuligowski commented 12 hours ago

Hi @warlock135, which version of HabanaAI vllm-fork are you using, 1.18.0 or habana_main?

warlock135 commented 12 hours ago

Hi @warlock135, which version of HabanaAI vllm-fork are you using, 1.18.0 or habana_main?

I'm using the habana_main version

michalkuligowski commented 11 hours ago

@warlock135 we will try to find what the issue is. In the meantime, please try using v1.18.0 branch (tag v0.5.3.post1+Gaudi-1.18.0), and see if the issue is still present.

HabanaAI / vllm-fork