Back-propping via the Attention Loss weighting

GuoxiaWang / DOOBNet

Caffe implementation of DOOBNet https://arxiv.org/abs/1806.03772

MIT License

29 stars 11 forks source link

Back-propping via the Attention Loss weighting #6

Closed nw89 closed 3 years ago

nw89 commented 5 years ago

I noticed in that in src/caffe/layers/class_balanced_sigmoid_cross_entropy_attention_loss_layer.cu that the following code: bottom_diff[i] = scale[i] (target_value == 1 ? (1 - sigmoid_data[ i ]) : sigmoid_data[ i ]) tmp; suggests that the scale[i] is treated as a constant. It therefore appears to be that only the log(p) and log(1-p) terms carry a gradient and not the Beta^{(1-p)^gamma} or Beta^{p^gamma}.

Is this because it otherwise leads to numerical instability?

GuoxiaWang commented 5 years ago

@nw89

No.

scale[i] is the common factor in the forward and backward formula, please refer to scale[i].

nw89 commented 5 years ago

But scale is a function of sigmoid_data[i]. So as well as treating scale[i] as a constant, and adding a gradient to the log/logits terms, there should be a gradient caused by the contribution to scale[i] with the log/logit terms.