THUDM / ChatGLM2-6B

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
Other
15.68k stars 1.85k forks source link

[Help] <ptuning时,增大target_length后训练,模型不收敛> #614

Open qdchenxiaoyan opened 10 months ago

qdchenxiaoyan commented 10 months ago

Is there an existing issue for this?

Current Behavior

上次ptuning摘要时,source_length=5000,target_length=120,训练过程正常,推理结果也正常 这次source_length=50,target_length=2000,训练过程loss震荡不收敛,推理结果也出现无序、多次重复的情况

image image image image

Expected Behavior

No response

Steps To Reproduce

1

Environment

- OS:linux 3.10.0-1160.76.1.el7.x86_64
- Python: 3.10.12
- Transformers: 4.30.2 
- Torch:2.0.1 
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :True

Anything else?

No response

MIhappen commented 9 months ago

兄弟,解决了吗?