THUDM / ChatGLM2-6B

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
Other
15.68k stars 1.85k forks source link

[Help] API部署的情况下,怎么打断模型执行?终止生成 #622

Open mymynew opened 10 months ago

mymynew commented 10 months ago

Is there an existing issue for this?

Current Behavior

API部署,一次生成有时候耗时很长。这种情况下,想终断模型生成,该调用啥接口能实现终止执行?

Expected Behavior

No response

Steps To Reproduce

...

Environment

- OS:ubuntu
- Python:3.10
- Transformers:
- PyTorch:2.0.1
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :true

Anything else?

No response