Closed nw89 closed 3 years ago
@nw89
No.
scale[i] is the common factor in the forward and backward formula, please refer to scale[i].
But scale is a function of sigmoid_data[i]. So as well as treating scale[i] as a constant, and adding a gradient to the log/logits terms, there should be a gradient caused by the contribution to scale[i] with the log/logit terms.
I noticed in that in src/caffe/layers/class_balanced_sigmoid_cross_entropy_attention_loss_layer.cu that the following code: bottom_diff[i] = scale[i] (target_value == 1 ? (1 - sigmoid_data[ i ]) : sigmoid_data[ i ]) tmp; suggests that the scale[i] is treated as a constant. It therefore appears to be that only the log(p) and log(1-p) terms carry a gradient and not the Beta^{(1-p)^gamma} or Beta^{p^gamma}.
Is this because it otherwise leads to numerical instability?