Open Z-ZHHH opened 2 years ago
The loss could go negative when learning with negative labels when adopting the cross-entropy loss, this is simply because: the ce loss will multiply per class -log(p_i) with the soft label. For the irrelevant classes (not equal to the training label), multiplying with a negative soft label may result in a negative loss (see here).
In our paper, we discuss how to address this issue in practice (in Appendix D.2). Briefly speaking, negative labels rely on a relatively well-pre-trained model. Since the mechanism is to enhance model confidence in its prediction. Thus, if training with negative labels at the beginning of the training procedure, it is possible that the model becomes overly-confident in bad representation. (The learned presentation is likely to be bad at the beginning of the training)
Thanks a lot! I just tried the NLS during the whole training process and it didn't work. Thanks for the details.
Great work! Does the loss value be negative during the training if we use the negative labels? When the feature collapse to the class prototype, the logit will be strict one-hot. It seems that the loss value -> -infinity.