HabanaAI / vllm-fork

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
39 stars 48 forks source link

[Bug]: benchmark_latency.py cannot exit when using tp #197

Open JunxiChhen opened 2 months ago

JunxiChhen commented 2 months ago

Your current environment

Command line:

cd vllm-fork/benchmarks
python benchmark_latency.py \
    --model meta-llama/Meta-Llama-3-8B \
    --dtype bfloat16 \
    --output-len 128 \
    --num-iters 1 \
    --num-iters-warmup 1 \
    --trust-remote-code \
    --batch-size 256 \
    --device hpu \
    --block-size 128 \
    --input-len 1024 \
    -tp 2"

🐛 Describe the bug

The script above cannot exit and hang there util using Ctrl+C to stop it manually.

JunxiChhen commented 2 months ago

Test tag is 0.5.3.post1-Gaudi-1.17.0

kzawora-intel commented 2 months ago

Thanks for the report! I've investigated this issue before and unfortunately, it's a bug in HCCL and is beyond vLLM's control, as the deadlock occurs after the main function exits. We are working with HCCL to resolve this.

While it's not ideal, you can exit by adding the following workaround after the benchmark:

import os
os._exit(0)

I will update this issue once we have a HCCL fix.

michalkuligowski commented 2 weeks ago

Testing possible fix #379