EmbeddedLLM / vllm-rocm

vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
https://vllm.readthedocs.io
Apache License 2.0
83 stars 5 forks source link

AssertionError assert output == other_output #10

Closed shangshng closed 7 months ago

shangshng commented 7 months ago

Hello,

I'm now using base docker rocm/pytorch:rocm5.7_ubuntu20.04_py3.9_pytorch_2.0.1. It works fine when using TP=1 or the number of prompts is small, but when I using 2 GPUs there is the error:

  File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm-0.2.1-py3.10-linux-x86_64.egg/vllm/entrypoints/llm.py", line 157, in generate
    return self._run_engine(use_tqdm)
  File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm-0.2.1-py3.10-linux-x86_64.egg/vllm/entrypoints/llm.py", line 177, in _run_engine
    step_outputs = self.llm_engine.step()
  File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm-0.2.1-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 562, in step
    output = self._run_workers(
  File "/opt/conda/envs/vllm/lib/python3.10/site-packages/vllm-0.2.1-py3.10-linux-x86_64.egg/vllm/engine/llm_engine.py", line 712, in _run_workers
    assert output == other_output
AssertionError

I saw the same problem is fixed in original repo: https://github.com/vllm-project/vllm/pull/1389, will it be fixed here?

tjtanaa commented 7 months ago

The version of the code has the implementation from the issue https://github.com/vllm-project/vllm/pull/1389 . And we can't reproduce the error you are facing. Could you share more on any of the changes or configurations that you have made?

shangshng commented 7 months ago

Thank you @tjtanaa . I also put an issue on vllm main: issue. I think the problem is caused by the new parameter 'max_paddings' and I don't know how to tune it.

tanpinsiang commented 7 months ago

Thank you for reporting the issue. If you have any further questions please feel free to reopen the issue or create a new one. Thank you again for your contribution!