facebookresearch / Detectron

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Apache License 2.0
26.24k stars 5.45k forks source link

bbox_loss=0 while fl_loss still positive #67

Closed coldgemini closed 6 years ago

coldgemini commented 6 years ago

in some cases the bbox loss on top layer give 0 which I guess is due to lack of positive samples but at that time the focal loss value is still a positive number. Shouldn't it also be zero? I pasted some training log below.

json_stats: {"eta": "8:15:13", "fl_fpn3": 0.000466, "fl_fpn4": 0.000296, "fl_fpn5": 0.000173, "fl_fpn6": 0.000031, "iter": 71200, "loss": 0.169883, "lr": 0.010000, "mb_qsize": 64, "mem": 5377, "retnet_bg_num": 3916572.500000, "retnet_fg_num": 19.500000, "retnet_loss_bbox_fpn3": 0.077045, "retnet_loss_bbox_fpn4": 0.057692, "retnet_loss_bbox_fpn5": 0.016527, "retnet_loss_bbox_fpn6": 0.000000, "time": 0.273104}

coldgemini commented 6 years ago

also , why don't you report the fg sample number per layer?

nonstop1962 commented 6 years ago

same problem when I try to train my own dataset. Is there any clue to solve this? below is my terminal output

json_stats: {"eta": "8:18:46", "fl_fpn3": 0.067543, "fl_fpn4": 0.013578, "fl_fpn5": 0.003218, "fl_fpn6": 0.000806, "fl_fpn7": 0.000202, "iter": 80, "loss": 0.093823, "lr": 0.000440, "mb_qsize": 64, "mem": 5862, "retnet_bg_num": 2970840.500000, "retnet_fg_num": 18.500000, "retnet_loss_bbox_fpn3": 0.000000, "retnet_loss_bbox_fpn4": 0.000000, "retnet_loss_bbox_fpn5": 0.000000, "retnet_loss_bbox_fpn6": 0.000000, "retnet_loss_bbox_fpn7": 0.000000, "time": 0.332815}

rbgirshick commented 6 years ago

I haven't looked into this in detail, but my guess is that since the printed loss values are median filtered what you're seeing might be an artifact of that smoothing.

rbgirshick commented 6 years ago

Confirmed that this is just an artifact of median filtering. We may change it to average since this seems to confuse people.

soumenms2015 commented 6 years ago

I have the same issue like bbox loss is 0 all the time.

soumenms2015 commented 6 years ago

@rbgirshick Hello, I have also tried with Average instead of median , the error is still there. Could you please let me know what could the reason?

nonstop1962 commented 6 years ago

@soumenms2015 As I read the code, the display value is only for 'display', and it does not influence the training result. So I think you don't need to care about it.