dropreg / R-Drop

870 stars 107 forks source link

Can mseloss replace KL divergence? #20

Closed 18335100284 closed 2 years ago

18335100284 commented 2 years ago

Great job. R-Drop forces the output distributions of different sub models generated by dropout to be consistent with each other. So can mseloss replace KL divergence?Looking forward to your reply.

apeterswu commented 2 years ago

Hi,

Yes, as we presented in appendix A.4, the STS-B task in GLUE is a regression task, therefore MSE regularization is required. You can check appendix A.4 for the simple extension.