HabanaAI / vllm-fork

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
43 stars 58 forks source link

[Bug]: Recent rebase to vllm v0.6.3 breaks Tensor Parallel inference!!! #354

Closed xuechendi closed 1 month ago

xuechendi commented 1 month ago

Your current environment

test on version "vllm-0.6.3.dev310+gc7b1509e.gaudi"

Model Input Dumps

No response

🐛 Describe the bug

image

Test script:

# VLLM_SKIP_WARMUP=true PT_HPU_ENABLE_LAZY_COLLECTIVES=true VLLM_RAY_DISABLE_LOG_TO_DRIVER=1

import os, sys
import traceback
from pathlib import Path

VLLM_PATH = os.path.join(Path(__file__).parent.parent, "vllm")
#VLLM_PATH = os.path.join(Path(__file__).parent.parent, "vllm", "tests")
sys.path.append(VLLM_PATH)
print(sys.path)

from vllm import LLM, SamplingParams

def test_llm_model(model):

    tp_size = 4

    llm = LLM(model=model,
              tensor_parallel_size=tp_size,
              block_size=128,
              trust_remote_code=True,
              #enforce_eager=True,
             )

    prompts = [
        "Introduce me to China.",
    ]
    sampling_params = [
        SamplingParams(temperature=0.0, max_tokens=256,)
        for _ in prompts
    ]

    output = llm.generate(prompts, sampling_params=sampling_params, use_tqdm=False)
    del llm
    return output

if __name__ == "__main__":
    model_list = ["meta-llama/Meta-Llama-3.1-70B"]
    for model in model_list:
        try:
            output = test_llm_model(model=model)
            print(f"- {model} succeed")
            print(f"output is", output)
        except Exception as e:
            print(f" - {model} faied!")
            print(f"Error info: {e}")
            print(traceback.format_exc())

Before submitting a new issue...

xuechendi commented 1 month ago

@kzawora-intel @madamczykhabana

xuechendi commented 1 month ago

Testing on 1.17.1 => download from https://vault.habana.ai/artifactory/gaudi-installer/1.17.1/habanalabs-installer.sh HL-SMI Version: hl-1.17.1-fw-51.5.0 Driver Version: 1.17.1-78932ae

kzawora-intel commented 1 month ago

It looks like a bug with 1.17.0/1.17.1 Synapse image. It will be fixed in Synapse 1.18, which will release very shortly.

xuechendi commented 1 month ago

1.18.0 Synapse released on 10/11. With newly release 1.18 + habana_main, TP issue is gone.

I'll close this issue report