THUDM / ChatGLM2-6B

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
Other
15.68k stars 1.85k forks source link

[BUG/Help] <webdemo2.py中streamlit处理多个用户并发时模型推理速度非常慢> #527

Open jamesruio opened 1 year ago

jamesruio commented 1 year ago

Is there an existing issue for this?

Current Behavior

webdemo2.py中streamlit处理单用户问答很快,但是多个用户并发提问时模型推理速度非常慢。观察GPU利用率是很高的

Expected Behavior

No response

Steps To Reproduce

  1. streamlit run webdemo2.py
  2. 开启多个tab
  3. 同时输入问题,推理速度明显下降

Environment

- OS:Ubuntu 20.04
- Python:3.8
- Transformers:4.30.2
- PyTorch:1.12
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :True

Anything else?

No response

jamesruio commented 1 year ago

@duzx16