ZJULearning / ttfnet

Apache License 2.0
481 stars 96 forks source link

What's meaning of wh_offset_base ? #34

Open HAOCHENYE opened 4 years ago

HAOCHENYE commented 4 years ago

It seems that the model will converge much more slower if wh_offset_base =1, especially the wh_loss. Besides, compared with centernet, the loss weight of wh_loss is much more larger than hm_loss(centernet: wh_loss_weight=0.1, hm_loss=1),why?

PeterVennerstrom commented 3 years ago

The wh_offset_base enables smaller, more easily predicted logits for wh. Wh logits are more similar to hm logits using this method.

The loss values after weighting are balanced for wh and hm. TTFNet uses a different wh loss than centernet requiring a different loss re-weighting to keep the two losses in approximate balance.

HAOCHENYE commented 3 years ago

g Thanks a lot! Now I understand why ttf using a larger hm_loss. But what's the meaning of "wh logits"? I think ttf just using a gaussian heatmap to choose the postive index of groud truth, and "wh" represents the ltrb of the bbox. Do you mean that the "wh" with a factor "wh_offset_base" can make ttf converge more easyily?

PeterVennerstrom commented 3 years ago

Logits refer to the feature maps fed into the final activation functions.

Wh logits --> relu --> wh_offset_base --> wh prediction

Hm logits --> sigmoid --> hm prediction

The sigmoid curve yields most of its error with values greater than -5 for 0 targets and less than 5 for 1 targets so -5 to 5 is the typical interval the logits will fall into.

TTFNet predicts the wh offsets at 512 x 512 image scale directly using relu as the activation function. E.g. a 142 length offset would require a logit of 142 . Using a wh_offset_base of 16 reduces the 142 logit to 8.9. Maintaining similar magnitude for both hm and wh logits will ease convergence.

HAOCHENYE commented 3 years ago

Thanks for your patience!