Open HAOCHENYE opened 4 years ago
The wh_offset_base enables smaller, more easily predicted logits for wh. Wh logits are more similar to hm logits using this method.
The loss values after weighting are balanced for wh and hm. TTFNet uses a different wh loss than centernet requiring a different loss re-weighting to keep the two losses in approximate balance.
g Thanks a lot! Now I understand why ttf using a larger hm_loss. But what's the meaning of "wh logits"? I think ttf just using a gaussian heatmap to choose the postive index of groud truth, and "wh" represents the ltrb of the bbox. Do you mean that the "wh" with a factor "wh_offset_base" can make ttf converge more easyily?
Logits refer to the feature maps fed into the final activation functions.
Wh logits --> relu --> wh_offset_base --> wh prediction
Hm logits --> sigmoid --> hm prediction
The sigmoid curve yields most of its error with values greater than -5 for 0 targets and less than 5 for 1 targets so -5 to 5 is the typical interval the logits will fall into.
TTFNet predicts the wh offsets at 512 x 512 image scale directly using relu as the activation function. E.g. a 142 length offset would require a logit of 142 . Using a wh_offset_base of 16 reduces the 142 logit to 8.9. Maintaining similar magnitude for both hm and wh logits will ease convergence.
Thanks for your patience!
It seems that the model will converge much more slower if wh_offset_base =1, especially the wh_loss. Besides, compared with centernet, the loss weight of wh_loss is much more larger than hm_loss(centernet: wh_loss_weight=0.1, hm_loss=1),why?