DeLightCMU / RSC

This is the official implementation of Self-Challenging Improves Cross-Domain Generalization, ECCV2020
BSD 2-Clause "Simplified" License
160 stars 18 forks source link

Why not use cross entropy for determining which features to mask? #11

Open BrianPugh opened 3 years ago

BrianPugh commented 3 years ago

In equation 1 in the paper, you compute the gradient of the element-wise product and the ground truth one-hot label with respect to the input feature vector. This is to find the features that contribute most to the ground truth class logit. For a softmax output, ideally we want the true label logit to be towards positive infinity while the other logits to be towards negative infinity.

So my question is, why not compute a more classical cross-entropy loss here: https://github.com/DeLightCMU/RSC/blob/63726803bafd66184cac87d0db8de0c0d58889ba/models/resnet.py#L90

instead of just the sum of the true logits?

Justinhzy commented 3 years ago

Hi, Logits encode how sensitive the output prediction is with respect to changes at the element of the feature maps. Losses encode the difficulties of the classifier to make a prediction using the element of the feature map.