Closed MayDomine closed 3 years ago
@MayDomine may be some mistake in line "loss = ce_loss + kl_loss loss = lambda_coeff * loss", and you can try to following code:
loss = ce_loss + lambda_coeff * kl_loss
At the same time, if kl_loss is still abnormal, you can just sum up the last dimension:
p_loss = p_loss.sum(-1).mean()
@dropreg I tried as you recommend.And now my loss reduce normally,but the bleu score is decreasing. any suggestion? maybe I should try to add R-drop when pretraing? now I just add it when finetuning.
And I noticed someone said their kl-loss decreased very fast.And my kl-loss seems to decrease quite slow.
@MayDomine It's worth noting that both Dropout Rate and lambda_coeff are very important hyperparameters, and R-Drop based method may take longer to train. Make sure the training epoch is larger than baseline !
Close it since there are no further comments. Reopen it if needed.
@MayDomine Hi, did you have improvement on your task? do you remember the ratio about the values of ce and kl? did you try to add R-drop in pretraining?
you need to train with more epoches,try set ratio 0.25
---Original--- From: @.> Date: Wed, Feb 16, 2022 15:35 PM To: @.>; Cc: @.**@.>; Subject: Re: [dropreg/R-Drop] R-drop makes my model broken. (#12)
@MayDomine Hi, did you have improvement on your task? Do you remember the ratio about the values of ce and kl?
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.Message ID: @.***>
@MayDomine Hi, What does the ratio in your reply mean? This kl loss is very small compared to ce. In addition, this loss drop is so erratic that I wonder if there is a problem.
I met this question too, and my advice is that you should set the ratio to make kl-loss is much smaller than loss of model.
------------------ 原始邮件 ------------------ 发件人: "dropreg/R-Drop" @.>; 发送时间: 2022年2月17日(星期四) 中午1:58 @.>; @.**@.>; 主题: Re: [dropreg/R-Drop] R-drop makes my model broken. (#12)
@MayDomine Hi, What does the ratio in your reply mean? This kl loss is very small compared to ce. In addition, this loss drop is so erratic that I wonder if there is a problem.
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.Message ID: @.***>
In my NMT task,I try to let the encoder and decoder to forward twice ,but the kl_loss is too large. Then I tried to compute the mean,but it is too small to have effect.
Can someone help me?