UCSC-REAL / negative-label-smoothing

[ICML2022 Long Talk] Official Pytorch implementation of "To Smooth or Not? When Label Smoothing Meets Noisy Labels"
114 stars 10 forks source link

Question about the negative label #1

Open Z-ZHHH opened 2 years ago

Z-ZHHH commented 2 years ago

Great work! Does the loss value be negative during the training if we use the negative labels? When the feature collapse to the class prototype, the logit will be strict one-hot. It seems that the loss value -> -infinity.

weijiaheng commented 2 years ago

The loss could go negative when learning with negative labels when adopting the cross-entropy loss, this is simply because: the ce loss will multiply per class -log(p_i) with the soft label. For the irrelevant classes (not equal to the training label), multiplying with a negative soft label may result in a negative loss (see here).

In our paper, we discuss how to address this issue in practice (in Appendix D.2). Briefly speaking, negative labels rely on a relatively well-pre-trained model. Since the mechanism is to enhance model confidence in its prediction. Thus, if training with negative labels at the beginning of the training procedure, it is possible that the model becomes overly-confident in bad representation. (The learned presentation is likely to be bad at the beginning of the training)

Z-ZHHH commented 2 years ago

Thanks a lot! I just tried the NLS during the whole training process and it didn't work. Thanks for the details.