HabanaAI / vllm-fork

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
39 stars 48 forks source link

[Bug]: Error executing method determine_num_available_blocks. This might cause deadlock in distributed execution. #417

Open pranjalst opened 5 days ago

pranjalst commented 5 days ago

Your current environment

Docker Image and Execution Command Overview

Docker Image Built From:

FROM vault.habana.ai/gaudi-docker/1.16.2/ubuntu22.04/habanalabs/pytorch-installer-2.2.2:latest

This image is designed for running PyTorch applications on Habana devices. It includes the VLLM fork and necessary dependencies installed via requirements-hpu.txt. Key configurations such as enabling lazy collectives for HPU are set up in the Dockerfile.

Execution Command:

if [ "$hw_mode" = "hpu" ]; then
    docker run -d --rm --runtime=habana --name="vllm-service" -p $port_number:80 \
    -e HABANA_VISIBLE_DEVICES=all \
    -e OMPI_MCA_btl_vader_single_copy_mechanism=none \
    --cap-add=sys_nice --ipc=host \
    -e HF_TOKEN="***" \
    /bin/bash -c "export VLLM_CPU_KVCACHE_SPACE=40 && python3 -m vllm.entrypoints.openai.api_server --enforce-eager --model meta-llama/Meta-Llama-3-8B-Instruct --tensor-parallel-size 2 --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs $max_num_seqs --max-seq_len-to-capture $max_seq_len_to_captur"
fi

Error Encountered:

(RayWorkerWrapper pid=8965) INFO 10-23 04:16:19 hpu_model_runner.py:692] Loading model weights took in total 7.481 GiB of device memory (7.486 GiB/94.62 GiB used) and 1.053 GiB of host memory (93.9 GiB/1007 GiB used)
ERROR 10-23 04:16:19 worker_base.py:464] Error executing method determine_num_available_blocks. This might cause deadlock in distributed execution.
ERROR 10-23 04:16:19 worker_base.py:464] Traceback (most recent call last):
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 72, in wrapper
ERROR 10-23 04:16:19 worker_base.py:464]     return func(*args, **kwargs)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 1998, in all_reduce
ERROR 10-23 04:16:19 worker_base.py:464]     work.wait()
ERROR 10-23 04:16:19 worker_base.py:464] IndexError: _Map_base::at
ERROR 10-23 04:16:19 worker_base.py:464]
ERROR 10-23 04:16:19 worker_base.py:464] During handling of the above exception, another exception occurred:
ERROR 10-23 04:16:19 worker_base.py:464]
ERROR 10-23 04:16:19 worker_base.py:464] Traceback (most recent call last):
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/worker_base.py", line 456, in execute_method
ERROR 10-23 04:16:19 worker_base.py:464]     return executor(*args, **kwargs)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 10-23 04:16:19 worker_base.py:464]     return func(*args, **kwargs)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/hpu_worker.py", line 180, in determine_num_available_blocks
ERROR 10-23 04:16:19 worker_base.py:464]     self.model_runner.profile_run()
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/hpu_model_runner.py", line 1314, in profile_run
ERROR 10-23 04:16:19 worker_base.py:464]     self.warmup_scenario(max_batch_size, max_seq_len, True, kv_caches,
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/hpu_model_runner.py", line 1387, in warmup_scenario
ERROR 10-23 04:16:19 worker_base.py:464]     self.execute_model(inputs, kv_caches, warmup_mode=True)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 10-23 04:16:19 worker_base.py:464]     return func(*args, **kwargs)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/hpu_model_runner.py", line 1970, in execute_model
ERROR 10-23 04:16:19 worker_base.py:464]     hidden_states = self.model.forward(
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 716, in forward
ERROR 10-23 04:16:19 worker_base.py:464]     return wrapped_hpugraph_forward(
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 570, in wrapped_hpugraph_forward
ERROR 10-23 04:16:19 worker_base.py:464]     return orig_fwd(*args, **kwargs)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/hpu_model_runner.py", line 359, in forward
ERROR 10-23 04:16:19 worker_base.py:464]     hidden_states = self.model(*args, **kwargs)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl
ERROR 10-23 04:16:19 worker_base.py:464]     return self._call_impl(*args, **kwargs)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1523, in _call_impl
ERROR 10-23 04:16:19 worker_base.py:464]     return forward_call(*args, **kwargs)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/model_executor/models/llama.py", line 566, in forward
ERROR 10-23 04:16:19 worker_base.py:464]     model_output = self.model(input_ids, positions, kv_caches,
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl
ERROR 10-23 04:16:19 worker_base.py:464]     return self._call_impl(*args, **kwargs)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1564, in _call_impl
ERROR 10-23 04:16:19 worker_base.py:464]     result = forward_call(*args, **kwargs)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/model_executor/models/llama.py", line 352, in forward
ERROR 10-23 04:16:19 worker_base.py:464]     hidden_states, residual = layer(positions, hidden_states,
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl
ERROR 10-23 04:16:19 worker_base.py:464]     return self._call_impl(*args, **kwargs)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1564, in _call_impl
ERROR 10-23 04:16:19 worker_base.py:464]     result = forward_call(*args, **kwargs)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/model_executor/models/llama.py", line 261, in forward
ERROR 10-23 04:16:19 worker_base.py:464]     hidden_states = self.self_attn(positions=positions,
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl
ERROR 10-23 04:16:19 worker_base.py:464]     return self._call_impl(*args, **kwargs)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1564, in _call_impl
ERROR 10-23 04:16:19 worker_base.py:464]     result = forward_call(*args, **kwargs)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/model_executor/models/llama.py", line 192, in forward
ERROR 10-23 04:16:19 worker_base.py:464]     output, _ = self.o_proj(attn_output)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl
ERROR 10-23 04:16:19 worker_base.py:464]     return self._call_impl(*args, **kwargs)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1564, in _call_impl
ERROR 10-23 04:16:19 worker_base.py:464]     result = forward_call(*args, **kwargs)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/model_executor/layers/linear.py", line 1090, in forward
ERROR 10-23 04:16:19 worker_base.py:464]     output = tensor_model_parallel_all_reduce(output_parallel)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/distributed/communication_op.py", line 11, in tensor_model_parallel_all_reduce
ERROR 10-23 04:16:19 worker_base.py:464]     return get_tp_group().all_reduce(input_)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/distributed/parallel_state.py", line 349, in all_reduce
ERROR 10-23 04:16:19 worker_base.py:464]     self._all_reduce_in_place(input_)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/distributed/parallel_state.py", line 383, in _all_reduce_in_place
ERROR 10-23 04:16:19 worker_base.py:464]     torch.distributed.all_reduce(input_, group=self.device_group)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 74, in wrapper
ERROR 10-23 04:16:19 worker_base.py:464]     msg_dict = _get_msg_dict(func.__name__, *args, **kwargs)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 49, in _get_msg_dict
ERROR 10-23 04:16:19 worker_base.py:464]     "args": f"{args}, {kwargs}",
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 471, in __repr__
ERROR 10-23 04:16:19 worker_base.py:464]     return torch._tensor_str._str(self, tensor_contents=tensor_contents)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/_tensor_str.py", line 677, in _str
ERROR 10-23 04:16:19 worker_base.py:464]     return _str_intern(self, tensor_contents=tensor_contents)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/_tensor_str.py", line 597, in _str_intern
ERROR 10-23 04:16:19 worker_base.py:464]     tensor_str = _tensor_str(self, indent)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/_tensor_str.py", line 349, in _tensor_str
ERROR 10-23 04:16:19 worker_base.py:464]     formatter = _Formatter(get_summarized_data(self) if summarize else self)
ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/_tensor_str.py", line 137, in __init__
ERROR 10-23 04:16:19 worker_base.py:464]     nonzero_finite_vals = torch.masked_select(
ERROR 10-23 04:16:19 worker_base.py:464] RuntimeError: [Rank:0] FATAL ERROR :: MODULE:PT_BRIDGE Exception in Launch thread...
ERROR 10-23 04:16:19 worker_base.py:464] Check $HABANA_LOGS/ for details_Map_base::at
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 72, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 1998, in all_reduce
    work.wait()
IndexError: _Map_base::at

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/engine/multiprocessing/engine.py", line 394, in run_mp_engine
    engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/engine/multiprocessing/engine.py", line 141, in from_engine_args
    return cls(
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/engine/multiprocessing/engine.py", line 78, in __init__
    self.engine = LLMEngine(*args,
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/engine/llm_engine.py", line 351, in __init__
    self._initialize_kv_caches()
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/engine/llm_engine.py", line 486, in _initialize_kv_caches
    self.model_executor.determine_num_available_blocks())
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/executor/distributed_gpu_executor.py", line 39, in determine_num_available_blocks
    num_blocks = self._run_workers("determine_num_available_blocks", )
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/executor/ray_hpu_executor.py", line 398, in _run_workers
    self.driver_worker.execute_method(method, *driver_args,
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/worker_base.py", line 465, in execute_method
    raise e
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/worker_base.py", line 456, in execute_method
    return executor(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/hpu_worker.py", line 180, in determine_num_available_blocks
    self.model_runner.profile_run()
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/hpu_model_runner.py", line 1314, in profile_run
    self.warmup_scenario(max_batch_size, max_seq_len, True, kv_caches,
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/hpu_model_runner.py", line 1387, in warmup_scenario
    self.execute_model(inputs, kv_caches, warmup_mode=True)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/hpu_model_runner.py", line 1970, in execute_model
    hidden_states = self.model.forward(
  File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 716, in forward
    return wrapped_hpugraph_forward(
  File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 570, in wrapped_hpugraph_forward
    return orig_fwd(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/hpu_model_runner.py", line 359, in forward
    hidden_states = self.model(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1523, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/model_executor/models/llama.py", line 566, in forward
    model_output = self.model(input_ids, positions, kv_caches,
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1564, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/model_executor/models/llama.py", line 352, in forward
    hidden_states, residual = layer(positions, hidden_states,
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1564, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/model_executor/models/llama.py", line 261, in forward
    hidden_states = self.self_attn(positions=positions,
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1564, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/model_executor/models/llama.py", line 192, in forward
    output, _ = self.o_proj(attn_output)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1564, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/model_executor/layers/linear.py", line 1090, in forward
    output = tensor_model_parallel_all_reduce(output_parallel)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/distributed/communication_op.py", line 11, in tensor_model_parallel_all_reduce
    return get_tp_group().all_reduce(input_)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/distributed/parallel_state.py", line 349, in all_reduce
    self._all_reduce_in_place(input_)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/distributed/parallel_state.py", line 383, in _all_reduce_in_place
    torch.distributed.all_reduce(input_, group=self.device_group)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 74, in wrapper
    msg_dict = _get_msg_dict(func.__name__, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 49, in _get_msg_dict
    "args": f"{args}, {kwargs}",
  File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 471, in __repr__
    return torch._tensor_str._str(self, tensor_contents=tensor_contents)
  File "/usr/local/lib/python3.10/dist-packages/torch/_tensor_str.py", line 677, in _str
    return _str_intern(self, tensor_contents=tensor_contents)
  File "/usr/local/lib/python3.10/dist-packages/torch/_tensor_str.py", line 597, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "/usr/local/lib/python3.10/dist-packages/torch/_tensor_str.py", line 349, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/usr/local/lib/python3.10/dist-packages/torch/_tensor_str.py", line 137, in __init__
    nonzero_finite_vals = torch.masked_select(
RuntimeError: [Rank:0] FATAL ERROR :: MODULE:PT_BRIDGE Exception in Launch thread...
Check $HABANA_LOGS/ for details_Map_base::at
2024-10-23 04:16:19,774 ERROR worker.py:409 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::RayWorkerWrapper.execute_method() (pid=8965, ip=172.17.0.2, actor_id=c31a4ed60e89b3d9851014d601000000, repr=<vllm.executor.ray_utils.RayWorkerWrapper object at 0x7f2aca0bb8e0>)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 1998, in all_reduce
    work.wait()
IndexError: _Map_base::at

During handling of the above exception, another exception occurred:

ray::RayWorkerWrapper.execute_method() (pid=8965, ip=172.17.0.2, actor_id=c31a4ed60e89b3d9851014d601000000, repr=<vllm.executor.ray_utils.RayWorkerWrapper object at 0x7f2aca0bb8e0>)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/worker_base.py", line 465, in execute_method
    raise e
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/worker_base.py", line 456, in execute_method
    return executor(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/hpu_worker.py", line 180, in determine_num_available_blocks
    self.model_runner.profile_run()
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/hpu_model_runner.py", line 1314, in profile_run
    self.warmup_scenario(max_batch_size, max_seq_len, True, kv_caches,
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/hpu_model_runner.py", line 1387, in warmup_scenario
    self.execute_model(inputs, kv_caches, warmup_mode=True)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/hpu_model_runner.py", line 1970, in execute_model
    hidden_states = self.model.forward(
  File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 716, in forward
    return wrapped_hpugraph_forward(
  File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 570, in wrapped_hpugraph_forward
    return orig_fwd(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/hpu_model_runner.py", line 359, in forward
    hidden_states = self.model(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1523, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/model_executor/models/llama.py", line 566, in forward
    model_output = self.model(input_ids, positions, kv_caches,
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1564, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/model_executor/models/llama.py", line 352, in forward
    hidden_states, residual = layer(positions, hidden_states,
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1564, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/model_executor/models/llama.py", line 261, in forward
    hidden_states = self.self_attn(positions=positions,
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1564, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/model_executor/models/llama.py", line 192, in forward
    output, _ = self.o_proj(attn_output)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1564, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/model_executor/layers/linear.py", line 1090, in forward
    output = tensor_model_parallel_all_reduce(output_parallel)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/distributed/communication_op.py", line 11, in tensor_model_parallel_all_reduce
    return get_tp_group().all_reduce(input_)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/distributed/parallel_state.py", line 349, in all_reduce
    self._all_reduce_in_place(input_)
  File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/distributed/parallel_state.py", line 383, in _all_reduce_in_place
    torch.distributed.all_reduce(input_, group=self.device_group)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 74, in wrapper
    msg_dict = _get_msg_dict(func.__name__, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 49, in _get_msg_dict
    "args": f"{args}, {kwargs}",
  File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 471, in __repr__
    return torch._tensor_str._str(self, tensor_contents=tensor_contents)
  File "/usr/local/lib/python3.10/dist-packages/torch/_tensor_str.py", line 677, in _str
    return _str_intern(self, tensor_contents=tensor_contents)
  File "/usr/local/lib/python3.10/dist-packages/torch/_tensor_str.py", line 597, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "/usr/local/lib/python3.10/dist-packages/torch/_tensor_str.py", line 349, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/usr/local/lib/python3.10/dist-packages/torch/_tensor_str.py", line 137, in __init__
    nonzero_finite_vals = torch.masked_select(
RuntimeError: [Rank:0] FATAL ERROR :: MODULE:PT_BRIDGE Exception in Launch thread...
Check $HABANA_LOGS/ for details_Map_base::at
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464] Error executing method determine_num_available_blocks. This might cause deadlock in distributed execution.
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464] Traceback (most recent call last):
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 72, in wrapper
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     return func(*args, **kwargs)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 1998, in all_reduce
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     work.wait()
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464] IndexError: _Map_base::at
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464] During handling of the above exception, another exception occurred:
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464] Traceback (most recent call last):
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/worker_base.py", line 456, in execute_method
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     return executor(*args, **kwargs)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     return func(*args, **kwargs)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/hpu_worker.py", line 180, in determine_num_available_blocks
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     self.model_runner.profile_run()
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/hpu_model_runner.py", line 1314, in profile_run
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     self.warmup_scenario(max_batch_size, max_seq_len, True, kv_caches,
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/hpu_model_runner.py", line 1387, in warmup_scenario
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     self.execute_model(inputs, kv_caches, warmup_mode=True)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     return func(*args, **kwargs)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/hpu_model_runner.py", line 1970, in execute_model
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     hidden_states = self.model.forward(
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 716, in forward
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     return wrapped_hpugraph_forward(
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 570, in wrapped_hpugraph_forward
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     return orig_fwd(*args, **kwargs)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/worker/hpu_model_runner.py", line 359, in forward
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     hidden_states = self.model(*args, **kwargs)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     return self._call_impl(*args, **kwargs)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1523, in _call_impl
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     return forward_call(*args, **kwargs)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/model_executor/models/llama.py", line 566, in forward
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     model_output = self.model(input_ids, positions, kv_caches,
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     return self._call_impl(*args, **kwargs)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1564, in _call_impl
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     result = forward_call(*args, **kwargs)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/model_executor/models/llama.py", line 352, in forward
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     hidden_states, residual = layer(positions, hidden_states,
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     return self._call_impl(*args, **kwargs)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1564, in _call_impl
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     result = forward_call(*args, **kwargs)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/model_executor/models/llama.py", line 261, in forward
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     hidden_states = self.self_attn(positions=positions,
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     return self._call_impl(*args, **kwargs)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1564, in _call_impl
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     result = forward_call(*args, **kwargs)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/model_executor/models/llama.py", line 192, in forward
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     output, _ = self.o_proj(attn_output)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     return self._call_impl(*args, **kwargs)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1564, in _call_impl
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     result = forward_call(*args, **kwargs)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/model_executor/layers/linear.py", line 1090, in forward
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     output = tensor_model_parallel_all_reduce(output_parallel)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/distributed/communication_op.py", line 11, in tensor_model_parallel_all_reduce
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     return get_tp_group().all_reduce(input_)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/distributed/parallel_state.py", line 349, in all_reduce
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     self._all_reduce_in_place(input_)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev559+g3af4b6ce.gaudi000-py3.10.egg/vllm/distributed/parallel_state.py", line 383, in _all_reduce_in_place
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     torch.distributed.all_reduce(input_, group=self.device_group)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 74, in wrapper
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     msg_dict = _get_msg_dict(func.__name__, *args, **kwargs)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 49, in _get_msg_dict
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     "args": f"{args}, {kwargs}",
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 471, in __repr__
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     return torch._tensor_str._str(self, tensor_contents=tensor_contents)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/_tensor_str.py", line 677, in _str
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     return _str_intern(self, tensor_contents=tensor_contents)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/_tensor_str.py", line 597, in _str_intern
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     tensor_str = _tensor_str(self, indent)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/_tensor_str.py", line 349, in _tensor_str
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     formatter = _Formatter(get_summarized_data(self) if summarize else self)
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]   File "/usr/local/lib/python3.10/dist-packages/torch/_tensor_str.py", line 137, in __init__
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464]     nonzero_finite_vals = torch.masked_select(
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464] RuntimeError: [Rank:0] FATAL ERROR :: MODULE:PT_BRIDGE Exception in Launch thread...
(RayWorkerWrapper pid=8965) ERROR 10-23 04:16:19 worker_base.py:464] Check $HABANA_LOGS/ for details_Map_base::at

This error occurs during the execution of the VLLM API server, indicating a potential deadlock issue related to resource management in the HPU environment. Further investigation is needed to resolve this issue for successful deployment.

Model Input Dumps

nan

🐛 Describe the bug

nan

Before submitting a new issue...

michalkuligowski commented 4 days ago

@pranjalst It seems you are using not latest, 1.16.0 version of SynapseAI with vllm-0.6.3. Please use latest SynapseAI 1.18.0 and HabanaAI vllm fork with branch v1.18.0 (tag: v0.5.3.post1+Gaudi-1.18.0).

michalkuligowski commented 1 hour ago

@pranjalst did you try latest version of SynapseAI and vllm-fork?