THUDM / ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Apache License 2.0
40.73k stars 5.22k forks source link

[Feature] ChatGLM dropout #1168

Open Youggls opened 1 year ago

Youggls commented 1 year ago

Is your feature request related to a problem? Please describe.

作者您好 本人在进行 ChatGLM 全参微调的相关工作,但是发现模型中并没有 Dropout 机制的相关代码(除了 pre_seq_len 不为 null 对应的 ptuning 情况)。 很好奇,因而去看了 THUDM 发布的 GLM 预训练模型的代码,发现模型中是存在 Dropout 机制的。 结合目前发现全参微调导致过拟合比较严重,ptuning 又不能拟合到一个比较好的结果。因此想咨询一下,ChatGLM 模型代码中没有 Dropout 相关代码是出于什么考虑呢?加入 Dropout 机制是否对过拟合问题有显著改善?

Solutions

加入 Dropout 机制

Additional context

No response

Youggls commented 1 year ago

Is your feature request related to a problem? Please describe.

Hello, author I am working on ChatGLM full-parameter fine-tuning, but I found that there is no code for Dropout mechanism in the model (except for the ptuning case where pre_seq_len is not null). Curious, I looked at the code of the GLM pre-training model published by THUDM and found that there is a Dropout mechanism in the model. I found that the dropout mechanism is present in the model, and I found that the overfitting is serious due to the full-parameter fine-tuning, and the ptuning cannot be fitted to a better result. Therefore, I would like to ask what is the reason for not having Dropout in the ChatGLM model code? Does adding the dropout mechanism significantly improve the overfitting problem?

Solutions

Add dropout mechanism code.

Additional context

No response

aftcool commented 1 year ago

我也发现了,使用R-drop策略的时候,预测logits一致