Closed sieu-n closed 2 years ago
Additionally, your work seems to be related to the consistency regularization methods of SSL, and the pi model in particular which uses Dropout as the noise source.
For reference, consistency regularization is based on the intuition that: if a small modification was applied to the unlabeled data points, the predictions should not change significantly. They are typically trained to minimize the differences between the inference results on a sample and results on an augmented sample.
Samuli Laine and Timo Aila. Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242, 2016.
In your paper uploaded in arxiv, you mentioned that the "R-Drop method tries to regularize on the model predictions by minimizing the bidirectional Kullback-Leibler (KL) divergence between these two output distributions for the same sample, which is:"
Is this bidirectional KL divergence diffrent from a standard JS divergence?