Closed Zjq9409 closed 2 weeks ago
@Zjq9409 , what is the test_message? I tested with exact same branch(v0.5.3.post1-Gaudi-1.17.0) w/ same env var and serving settings, it went through successfully.
from openai import OpenAI
if __name__ == "__main__":
model = "Qwen/Qwen2-7B-Instruct"
#model = "meta-llama/Meta-Llama-3.1-8B-Instruct"
llm = OpenAI(base_url="http://100.83.111.250:8000/v1", api_key="EMPTY")
input = [{"role": "user", "content": "你是谁?"}]
output = llm.chat.completions.create(
model=model,
messages=input,
stream=True,
max_tokens=128,
)
for chunk in output:
if cont := chunk.choices[0].delta.content:
print(cont, end='', flush=True)
print()
@Zjq9409 , I didn't see "VLLM_PROMPT_USE_FUSEDSDPA" enabled in "v0.5.3.post1-Gaudi-1.17.0". If you would like to test with FusedSDPA, may need to switch to "habana_main" branch
serving
I tested it using habana-main
@Zjq9409 , what is the test_message? I tested with exact same branch(v0.5.3.post1-Gaudi-1.17.0) w/ same env var and serving settings, it went through successfully.
from openai import OpenAI if __name__ == "__main__": model = "Qwen/Qwen2-7B-Instruct" #model = "meta-llama/Meta-Llama-3.1-8B-Instruct" llm = OpenAI(base_url="http://100.83.111.250:8000/v1", api_key="EMPTY") input = [{"role": "user", "content": "你是谁?"}] output = llm.chat.completions.create( model=model, messages=input, stream=True, max_tokens=128, ) for chunk in output: if cont := chunk.choices[0].delta.content: print(cont, end='', flush=True) print()
which branch are you use?
serving
I tested it using habana-main
according to the log you provided, you're testing on 'vllm-0.5.3.post1+gaudi117', which is behind habana-main. And are you testing on G2D or G2H?
BTW, I tested on both habana_main and the exact same branch (vllm-0.5.3.post1+gaudi117). Both works ok on QWen2-7B with same configuration on G2H.
Hi @Zjq9409 do you still observer the issue or can it be closed?
Closing due to no update from author, please open if issue occurs on latest version
Your current environment
driver 1.17 vllm 0.5.3.post1+gaudi117
🐛 Describe the bug