DeLightCMU / RSC

This is the official implementation of Self-Challenging Improves Cross-Domain Generalization, ECCV2020
BSD 2-Clause "Simplified" License
160 stars 18 forks source link

Question about the batch part #10

Closed SirRob1997 closed 3 years ago

SirRob1997 commented 3 years ago

The implementation of the batching part seems quite unintuitive for me, maybe you can clear up some of my understanding:

We calculate the before_vector and after_vector which represent the class probabilities for the correct class before and after applying the masking for certain samples inside each batch.

Next, we subtract the before_vector from the after_vector which means entries in change_vector represent if the masking makes our classifier more / less certain about the correct class for that specific sample. This is represented by negative (more) and positive (less) values inside change_vector.

We are only interested in the positive values, cases where masking decreases confidence, hence we calculate the threshold for Top-p according to only the positive values as done in L.134 and in L.135.

Next, we check which entries are greater than our threshold in L.136, this yields a binary mask.

This is where my question comes in:

L.137 basically inverts the mask. So instead of reverting the masking for Top-p percentage of samples where it decreases confidence, we are now reverting it for all samples besides Top-p?

Am I correct on this? Why was this done? For self-challenging, applying the masking for Top-p percentage of the samples with negative values seems more intuitive.

Also, while you're at it:

What is the purpose of subtracting 1e-5 in L.133? For me, this seems like a "threshold" (epsilon) i.e. the minimum confidence change to keep the masking. How did the performance change without it? In theory, this would be another hyperparameter

Justinhzy commented 3 years ago

Hi, thanks for your question. L.138 nonzero function inverts the mask again, so it means masking for Top-p percentage of the samples. Do I explain it clearly? About the purpose of subtracting 1e-5, I just try to avoid some corner cases, for example, if change_vector's elements are all zero.