[x] 1. I have searched related issues but cannot get the expected help.
[x] 2. The bug has not been fixed in the latest version.
[x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Checklist
Describe the bug
我用双卡910b进行推理时,速度比单卡910b慢了大约30% 用的torch_npu,测试模型是qwen2.5 7b instruct 单条长回复,单卡大约32token/s,双卡只有22token/s
Reproduction
我运行服务命令:lmdeploy serve api_server --backend pytorch --device ascend /home/ma-user/work/qwen2-7b --server-port 6007 --tp 2 --cache-max-entry-count=0.9
Environment
Error traceback
No response