Wrong derivation of negative gradient of sigmoid+BCE

GeraldHan / GGE

Code for Greedy Gradient Ensemble for Visual Question Answering （ICCV 2021, Oral）

MIT License

25 stars 2 forks source link

Wrong derivation of negative gradient of sigmoid+BCE #5

Open GeraldHan opened 2 years ago

GeraldHan commented 2 years ago

Sorry for the wrong derivation of the negative gradient for Sigmoid+BCE loss. The correct negative gradient is

$$ \nabla \mathcal{H}_i= y_i - \sigma(\mathcal{H}_i) $$

In theory, as long as the pseudo label has a negative correlation with the bias model prediction, it is able to mine the hard examples. The wrong gradient in the paper is actually an approximation of $\nabla \mathcal{H}_i$. That's why it still works well.

Murphyzc commented 1 year ago

What's reason about this statement "In theory, as long as the pseudo label has a negative correlation with the bias model prediction, it is able to mine the hard examples."?