[Bug]: Recent rebase to vllm v0.6.3 breaks Tensor Parallel inference!!!

xuechendi commented 1 month ago

Your current environment

test on version "vllm-0.6.3.dev310+gc7b1509e.gaudi"

Model Input Dumps

No response

🐛 Describe the bug

Test script:

# VLLM_SKIP_WARMUP=true PT_HPU_ENABLE_LAZY_COLLECTIVES=true VLLM_RAY_DISABLE_LOG_TO_DRIVER=1

import os, sys
import traceback
from pathlib import Path

VLLM_PATH = os.path.join(Path(__file__).parent.parent, "vllm")
#VLLM_PATH = os.path.join(Path(__file__).parent.parent, "vllm", "tests")
sys.path.append(VLLM_PATH)
print(sys.path)

from vllm import LLM, SamplingParams

def test_llm_model(model):

    tp_size = 4

    llm = LLM(model=model,
              tensor_parallel_size=tp_size,
              block_size=128,
              trust_remote_code=True,
              #enforce_eager=True,
             )

    prompts = [
        "Introduce me to China.",
    ]
    sampling_params = [
        SamplingParams(temperature=0.0, max_tokens=256,)
        for _ in prompts
    ]

    output = llm.generate(prompts, sampling_params=sampling_params, use_tqdm=False)
    del llm
    return output

if __name__ == "__main__":
    model_list = ["meta-llama/Meta-Llama-3.1-70B"]
    for model in model_list:
        try:
            output = test_llm_model(model=model)
            print(f"- {model} succeed")
            print(f"output is", output)
        except Exception as e:
            print(f" - {model} faied!")
            print(f"Error info: {e}")
            print(traceback.format_exc())

Before submitting a new issue...

[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

xuechendi commented 1 month ago

@kzawora-intel @madamczykhabana

xuechendi commented 1 month ago

Testing on 1.17.1 => download from https://vault.habana.ai/artifactory/gaudi-installer/1.17.1/habanalabs-installer.sh HL-SMI Version: hl-1.17.1-fw-51.5.0 Driver Version: 1.17.1-78932ae

kzawora-intel commented 1 month ago

It looks like a bug with 1.17.0/1.17.1 Synapse image. It will be fixed in Synapse 1.18, which will release very shortly.

xuechendi commented 1 month ago

1.18.0 Synapse released on 10/11. With newly release 1.18 + habana_main, TP issue is gone.

I'll close this issue report

HabanaAI / vllm-fork