THUDM / ChatGLM2-6B

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
Other
15.68k stars 1.85k forks source link

[BUG/Help] Model inference memory keeps growing #536

Open pipijiev12 opened 1 year ago

pipijiev12 commented 1 year ago

Is there an existing issue for this?

Current Behavior

I have added gradient-free calculation to the inference module. Why does the chatglm2 model continue to experience an increase in video memory for a long time? How to solve?

Expected Behavior

I want the model to infer for a long time without increasing the video memory

Steps To Reproduce

1.Run in the following environment 2.Use model default parameters 3.Long-running code model inference code 4.You can see the log WechatIMG116

Environment

- OS:Ubuntu 20.04
- Python: 3.8
- Transformers:4.26.1
- PyTorch:1.13 above version
- CUDA Support (`python -c "import torch; 117print(torch.cuda.is_available())"`) :True

Anything else?

Anything