THUDM / ChatGLM2-6B

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
Other
15.68k stars 1.85k forks source link

[Help] 长文本推理OOM #594

Open Wohoholo opened 10 months ago

Wohoholo commented 10 months ago

Is there an existing issue for this?

Current Behavior

在A100 80G的单显卡上半精度推理,text length=25000就OOM,有人有这种情况吗?寻求长文本处理优化方法

Expected Behavior

No response

Steps To Reproduce

model = Model.from_pretrained() model.generate(input, kwargs)

Environment

- OS:rethat
- Python:python3.8
- Transformers:4.28.0
- PyTorch:1.13.1
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :True

Anything else?

No response

HongyuJiang commented 8 months ago

试试把输入切分成多个chunk,然后对每个chunk进行summary,将所有的summary拼接到一起再喂给模型或许可以曲线救国

Wohoholo commented 8 months ago

谢谢,这个类似于mapreduce的方法有尝试过了。还有一种解决方法是进行一次长文本处理,torch gc后,等待GPU碎片释放后再进行下一条文本处理。

HongyuJiang commented 7 months ago

显存能支持25000长度文本的推理的话,确实是个好方法,学到了~