dropreg / R-Drop

870 stars 107 forks source link

JS divergence in the research paper? #17

Closed sieu-n closed 2 years ago

sieu-n commented 3 years ago

In your paper uploaded in arxiv, you mentioned that the "R-Drop method tries to regularize on the model predictions by minimizing the bidirectional Kullback-Leibler (KL) divergence between these two output distributions for the same sample, which is:"

image

Is this bidirectional KL divergence diffrent from a standard JS divergence?

sieu-n commented 3 years ago

Additionally, your work seems to be related to the consistency regularization methods of SSL, and the pi model in particular which uses Dropout as the noise source.

For reference, consistency regularization is based on the intuition that: if a small modification was applied to the unlabeled data points, the predictions should not change significantly. They are typically trained to minimize the differences between the inference results on a sample and results on an augmented sample.

Samuli Laine and Timo Aila. Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242, 2016.