Open warlock135 opened 13 hours ago
Hi @warlock135, which version of HabanaAI vllm-fork are you using, 1.18.0 or habana_main?
Hi @warlock135, which version of HabanaAI vllm-fork are you using, 1.18.0 or habana_main?
I'm using the habana_main version
@warlock135 we will try to find what the issue is. In the meantime, please try using v1.18.0 branch (tag v0.5.3.post1+Gaudi-1.18.0), and see if the issue is still present.
Your current environment
The output of `python collect_env.py`
``` NUMA node0 CPU(s): 0-39,80-119 NUMA node1 CPU(s): 40-79,120-159 Vulnerability Gather data sampling: Mitigation; Microcode Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Versions of relevant libraries: [pip3] habana-torch-dataloader==1.18.0.524 [pip3] habana-torch-plugin==1.18.0.524 [pip3] numpy==1.26.4 [pip3] pynvml==8.0.4 [pip3] pytorch-lightning==2.4.0 [pip3] pyzmq==26.2.0 [pip3] torch==2.4.0a0+git74cd574 [pip3] torch_tb_profiler==0.4.0 [pip3] torchaudio==2.4.0a0+69d4077 [pip3] torchdata==0.7.1+5e6f7b7 [pip3] torchmetrics==1.4.2 [pip3] torchtext==0.18.0a0+9bed85d [pip3] torchvision==0.19.0a0+48b1edf [pip3] transformers==4.45.2 [pip3] triton==3.1.0 [conda] Could not collect ROCM Version: Could not collect Neuron SDK Version: N/A vLLM Version: 0.6.3.dev554+g07c98a52 vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled GPU Topology: Could not collect ```Model Input Dumps
No response
🐛 Describe the bug
When inferencing with vllm (using openAI api server), I got the error below:
After this, all current request were released and vllm crash when another request came:
I started vllm with the following command (inside docker container with image vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest)
Before submitting a new issue...