Open jamesruio opened 1 year ago
webdemo2.py中streamlit处理单用户问答很快,但是多个用户并发提问时模型推理速度非常慢。观察GPU利用率是很高的
No response
- OS:Ubuntu 20.04 - Python:3.8 - Transformers:4.30.2 - PyTorch:1.12 - CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :True
@duzx16
Is there an existing issue for this?
Current Behavior
webdemo2.py中streamlit处理单用户问答很快,但是多个用户并发提问时模型推理速度非常慢。观察GPU利用率是很高的
Expected Behavior
No response
Steps To Reproduce
Environment
Anything else?
No response