Closed GengDavid closed 3 years ago
Thanks for this good question. Actually, we follow the same initial methods in Focal loss (see Sec. 4.1), which is also adopted in CornerNet and CenterNet. In particular, the bias is initialized with b = − log((1 − π)/π)
, where π indicates that 'every point should be labeled as foreground with confidence of ∼ π'. And the confidence of π is set to 0.1 in our paper. Thus, b = − log((1 − 0.1)/0.1)=-2.19
.
Got it! Thx. But it makes me wonder why you also use Focal loss for stuff position prediction? And the bias value is not set.
Actually, Focal loss is adopted to give things and stuff unified supervision. For the reason that it does not contain the so-called 'positive point' in the position head for stuff (or all the stuff points can be viewed as positive), there is no need to specifically set bias value for stuff positions.
Cool 👍, thanks for your reply.
Hi @yanwei-li , Thanks for sharing your great work. I found that you set
POSITION_HEAD.THING.BIAS_VALUE = -2.19
. Is any specific reason to set -2.19 as the bias value? It looks like a magic number, and when I set it to 0, training loss gets to NaN after about 28k iterations (it is weird and I'm not sure it occurs from the randomness or the different bias valud). Thx.