R-drop makes my model broken.

MayDomine commented 3 years ago

In my NMT task,I try to let the encoder and decoder to forward twice ,but the kl_loss is too large. Then I tried to compute the mean,but it is too small to have effect.

Can someone help me？

dropreg commented 3 years ago

@MayDomine may be some mistake in line "loss = ce_loss + kl_loss loss = lambda_coeff * loss", and you can try to following code:

loss = ce_loss + lambda_coeff * kl_loss

At the same time, if kl_loss is still abnormal, you can just sum up the last dimension:

p_loss = p_loss.sum(-1).mean()

MayDomine commented 3 years ago

@dropreg I tried as you recommend.And now my loss reduce normally,but the bleu score is decreasing. any suggestion？ maybe I should try to add R-drop when pretraing? now I just add it when finetuning.

MayDomine commented 3 years ago

And I noticed someone said their kl-loss decreased very fast.And my kl-loss seems to decrease quite slow.

dropreg commented 3 years ago

@MayDomine It's worth noting that both Dropout Rate and lambda_coeff are very important hyperparameters, and R-Drop based method may take longer to train. Make sure the training epoch is larger than baseline !

apeterswu commented 3 years ago

Close it since there are no further comments. Reopen it if needed.

XiaoqingNLP commented 2 years ago

@MayDomine Hi, did you have improvement on your task? do you remember the ratio about the values of ce and kl? did you try to add R-drop in pretraining?

MayDomine commented 2 years ago

you need to train with more epoches,try set ratio 0.25

---Original--- From: @.> Date: Wed, Feb 16, 2022 15:35 PM To: @.>; Cc: @.**@.>; Subject: Re: [dropreg/R-Drop] R-drop makes my model broken. (#12)

@MayDomine Hi, did you have improvement on your task? Do you remember the ratio about the values of ce and kl?

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.Message ID: @.***>

XiaoqingNLP commented 2 years ago

@MayDomine Hi, What does the ratio in your reply mean? This kl loss is very small compared to ce. In addition, this loss drop is so erratic that I wonder if there is a problem.

MayDomine commented 2 years ago

I met this question too, and my advice is that you should set the ratio to make kl-loss is much smaller than loss of model.

------------------ 原始邮件 ------------------ 发件人: "dropreg/R-Drop" @.>; 发送时间: 2022年2月17日(星期四) 中午1:58 @.>; @.**@.>; 主题: Re: [dropreg/R-Drop] R-drop makes my model broken. (#12)

@MayDomine Hi, What does the ratio in your reply mean? This kl loss is very small compared to ce. In addition, this loss drop is so erratic that I wonder if there is a problem.

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.Message ID: @.***>

dropreg / R-Drop

R-drop makes my model broken. #12