Closed shangshng closed 7 months ago
The version of the code has the implementation from the issue https://github.com/vllm-project/vllm/pull/1389 . And we can't reproduce the error you are facing. Could you share more on any of the changes or configurations that you have made?
Thank you @tjtanaa . I also put an issue on vllm main: issue. I think the problem is caused by the new parameter 'max_paddings' and I don't know how to tune it.
Thank you for reporting the issue. If you have any further questions please feel free to reopen the issue or create a new one. Thank you again for your contribution!
Hello,
I'm now using base docker rocm/pytorch:rocm5.7_ubuntu20.04_py3.9_pytorch_2.0.1. It works fine when using TP=1 or the number of prompts is small, but when I using 2 GPUs there is the error:
I saw the same problem is fixed in original repo: https://github.com/vllm-project/vllm/pull/1389, will it be fixed here?