THUDM / ChatGLM2-6B

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
Other
15.71k stars 1.85k forks source link

有没有人测试过在一台80GA100上能同时处理多少个请求呢? #551

Open Junglesl opened 1 year ago

Junglesl commented 1 year ago

Is there an existing issue for this?

Current Behavior

单次推理需要13G的显存,是不是80G的显存大概可以同时支持8个请求呢?

Expected Behavior

No response

Steps To Reproduce

如题

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response