Closed coldgemini closed 6 years ago
also , why don't you report the fg sample number per layer?
same problem when I try to train my own dataset. Is there any clue to solve this? below is my terminal output
json_stats: {"eta": "8:18:46", "fl_fpn3": 0.067543, "fl_fpn4": 0.013578, "fl_fpn5": 0.003218, "fl_fpn6": 0.000806, "fl_fpn7": 0.000202, "iter": 80, "loss": 0.093823, "lr": 0.000440, "mb_qsize": 64, "mem": 5862, "retnet_bg_num": 2970840.500000, "retnet_fg_num": 18.500000, "retnet_loss_bbox_fpn3": 0.000000, "retnet_loss_bbox_fpn4": 0.000000, "retnet_loss_bbox_fpn5": 0.000000, "retnet_loss_bbox_fpn6": 0.000000, "retnet_loss_bbox_fpn7": 0.000000, "time": 0.332815}
I haven't looked into this in detail, but my guess is that since the printed loss values are median filtered what you're seeing might be an artifact of that smoothing.
Confirmed that this is just an artifact of median filtering. We may change it to average since this seems to confuse people.
I have the same issue like bbox loss is 0 all the time.
@rbgirshick Hello, I have also tried with Average instead of median , the error is still there. Could you please let me know what could the reason?
@soumenms2015 As I read the code, the display value is only for 'display', and it does not influence the training result. So I think you don't need to care about it.
in some cases the bbox loss on top layer give 0 which I guess is due to lack of positive samples but at that time the focal loss value is still a positive number. Shouldn't it also be zero? I pasted some training log below.
json_stats: {"eta": "8:15:13", "fl_fpn3": 0.000466, "fl_fpn4": 0.000296, "fl_fpn5": 0.000173, "fl_fpn6": 0.000031, "iter": 71200, "loss": 0.169883, "lr": 0.010000, "mb_qsize": 64, "mem": 5377, "retnet_bg_num": 3916572.500000, "retnet_fg_num": 19.500000, "retnet_loss_bbox_fpn3": 0.077045, "retnet_loss_bbox_fpn4": 0.057692, "retnet_loss_bbox_fpn5": 0.016527, "retnet_loss_bbox_fpn6": 0.000000, "time": 0.273104}