Closed soumenms2015 closed 6 years ago
This is gonna be hard to give any educated comment on unless you give more information on what you are trying to achieve.
I am trying to run retinanet/ MASK RCNN on my own dataset,As you can see the the bbox_loss is 0 all the time . I tried to change the smoothing the loss with Average instead of median filtering as @rbgirshick suggested in the issue #67 I don't know where it goes wrong.
Try training COCO for reference first. If everything is working out fine there then you'll need to revisit your dataset generation. It looks like most definitely something with the boxes/masks in your dataset is off.
Thanks !! Let me try with COCO then let's see if it works. I cross checked with my dataset generation and it seems okay. I am not sure if the dataset has any issue. Will update !! Thanks a lot for your suggestion !
I have checked with COCO dataset as well and it is giving the same bbox loss 0
When you are getting losses all over the place with an unmodified config file and unmodified coco dataset then clearly something is wrong with either your detectron or caffe2 repos, in which case I would suggest starting from scratch and checking some baseline models.
@kampelmuehler I tried with the new repository and checked again. Unfortunately no improvement.
The loss is NaN, earlier this problem was got rid of by reducing the learning rate. But now I reduced it to 0.000001 but the training got stuck at first iteration and bbox_loss is 0 this problem is same as the earlier repository.
Here is the error:
INFO net.py: 240: retnet_bbox_conv_n3_fpn6 : (2, 256, 22, 10) => retnet_bbox_pred_fpn6 : (2, 36, 22, 10) ------- (op: Conv) INFO net.py: 240: retnet_bbox_conv_n3_fpn7 : (2, 256, 11, 5) => retnet_bbox_pred_fpn7 : (2, 36, 11, 5) ------- (op: Conv) INFO net.py: 240: retnet_bbox_pred_fpn3 : (2, 36, 176, 80) => retnet_loss_bbox_fpn3 : () ------- (op: SelectSmoothL1Loss) INFO net.py: 240: retnet_roi_bbox_targets_fpn3: (46, 4) => retnet_loss_bbox_fpn3 : () ------| INFO net.py: 240: retnet_roi_fg_bbox_locs_fpn3: (46, 4) => retnet_loss_bbox_fpn3 : () ------| INFO net.py: 240: retnet_fg_num : (1,) => retnet_loss_bbox_fpn3 : () ------| INFO net.py: 240: retnet_bbox_pred_fpn4 : (2, 36, 88, 40) => retnet_loss_bbox_fpn4 : () ------- (op: SelectSmoothL1Loss) INFO net.py: 240: retnet_roi_bbox_targets_fpn4: (122, 4) => retnet_loss_bbox_fpn4 : () ------| INFO net.py: 240: retnet_roi_fg_bbox_locs_fpn4: (122, 4) => retnet_loss_bbox_fpn4 : () ------| INFO net.py: 240: retnet_fg_num : (1,) => retnet_loss_bbox_fpn4 : () ------| INFO net.py: 240: retnet_bbox_pred_fpn5 : (2, 36, 44, 20) => retnet_loss_bbox_fpn5 : () ------- (op: SelectSmoothL1Loss) INFO net.py: 240: retnet_roi_bbox_targets_fpn5: (107, 4) => retnet_loss_bbox_fpn5 : () ------| INFO net.py: 240: retnet_roi_fg_bbox_locs_fpn5: (107, 4) => retnet_loss_bbox_fpn5 : () ------| INFO net.py: 240: retnet_fg_num : (1,) => retnet_loss_bbox_fpn5 : () ------| INFO net.py: 240: retnet_bbox_pred_fpn6 : (2, 36, 22, 10) => retnet_loss_bbox_fpn6 : () ------- (op: SelectSmoothL1Loss) INFO net.py: 240: retnet_roi_bbox_targets_fpn6: (74, 4) => retnet_loss_bbox_fpn6 : () ------| INFO net.py: 240: retnet_roi_fg_bbox_locs_fpn6: (74, 4) => retnet_loss_bbox_fpn6 : () ------| INFO net.py: 240: retnet_fg_num : (1,) => retnet_loss_bbox_fpn6 : () ------| INFO net.py: 240: retnet_bbox_pred_fpn7 : (2, 36, 11, 5) => retnet_loss_bbox_fpn7 : () ------- (op: SelectSmoothL1Loss) INFO net.py: 240: retnet_roi_bbox_targets_fpn7: (75, 4) => retnet_loss_bbox_fpn7 : () ------| INFO net.py: 240: retnet_roi_fg_bbox_locs_fpn7: (75, 4) => retnet_loss_bbox_fpn7 : () ------| INFO net.py: 240: retnet_fg_num : (1,) => retnet_loss_bbox_fpn7 : () ------| INFO net.py: 240: retnet_cls_pred_fpn3 : (2, 81, 176, 80) => fl_fpn3 : () ------- (op: SigmoidFocalLoss) INFO net.py: 240: retnet_cls_labels_fpn3 : (2, 9, 176, 80) => fl_fpn3 : () ------| INFO net.py: 240: retnet_fg_num : (1,) => fl_fpn3 : () ------| INFO net.py: 240: retnet_cls_pred_fpn4 : (2, 81, 88, 40) => fl_fpn4 : () ------- (op: SigmoidFocalLoss) INFO net.py: 240: retnet_cls_labels_fpn4 : (2, 9, 88, 40) => fl_fpn4 : () ------| INFO net.py: 240: retnet_fg_num : (1,) => fl_fpn4 : () ------| INFO net.py: 240: retnet_cls_pred_fpn5 : (2, 81, 44, 20) => fl_fpn5 : () ------- (op: SigmoidFocalLoss) INFO net.py: 240: retnet_cls_labels_fpn5 : (2, 9, 44, 20) => fl_fpn5 : () ------| INFO net.py: 240: retnet_fg_num : (1,) => fl_fpn5 : () ------| INFO net.py: 240: retnet_cls_pred_fpn6 : (2, 81, 22, 10) => fl_fpn6 : () ------- (op: SigmoidFocalLoss) INFO net.py: 240: retnet_cls_labels_fpn6 : (2, 9, 22, 10) => fl_fpn6 : () ------| INFO net.py: 240: retnet_fg_num : (1,) => fl_fpn6 : () ------| INFO net.py: 240: retnet_cls_pred_fpn7 : (2, 81, 11, 5) => fl_fpn7 : () ------- (op: SigmoidFocalLoss) INFO net.py: 240: retnet_cls_labels_fpn7 : (2, 9, 11, 5) => fl_fpn7 : () ------| INFO net.py: 240: retnet_fg_num : (1,) => fl_fpn7 : () ------| INFO net.py: 244: End of model: retinanet /home/.../anaconda2/lib/python2.7/site-packages/numpy/lib/function_base.py:4033: RuntimeWarning: Invalid value encountered in median r = func(a, **kwargs) json_stats: {"eta": "17 days, 4:49:12", "fl_fpn3": 0.000000, "fl_fpn4": 0.000000, "fl_fpn5": 0.000000, "fl_fpn6": NaN, "fl_fpn7": 0.000000, "iter": 0, "loss": NaN, "lr": 0.000000, "mb_qsize": 64, "mem": 9069, "retnet_bg_num": 6678042.000000, "retnet_fg_num": 426.000000, "retnet_loss_bbox_fpn3": 0.000000, "retnet_loss_bbox_fpn4": 0.000000, "retnet_loss_bbox_fpn5": 0.000000, "retnet_loss_bbox_fpn6": 0.000000, "retnet_loss_bbox_fpn7": 0.000000, "time": 16.512804} CRITICAL train_net.py: 159: Loss is NaN, exiting... INFO loader.py: 126: Stopping enqueue thread INFO loader.py: 113: Stopping mini-batch loading thread INFO loader.py: 113: Stopping mini-batch loading thread INFO loader.py: 113: Stopping mini-batch loading thread INFO loader.py: 113: Stopping mini-batch loading thread
I am using the ImageNet pretrained Model ResNext101
Config file : retinanet_X-101-64x4d-FPN_1x.yaml and using my own dataset.
Earlier I used the coco dataset as well as my own dataset with old repository and bbox_loss is 0 at every iteration.
Any suggestion/idea would be much appreciated. Thank you very much !
okay here what you are going to do have cheak the setting
On Mon, Mar 5, 2018 at 2:46 PM, Moritz Kampelmühler < notifications@github.com> wrote:
This is gonna be hard to give any educated comment on unless you give more information on what you are trying to achieve.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/facebookresearch/Detectron/issues/205#issuecomment-370423023, or mute the thread https://github.com/notifications/unsubscribe-auth/AjJssJvL0uhBHyBw6LuZBp4_FhrH0hqZks5tbUGAgaJpZM4SWkQX .
hello,
I am getting the bbox loss 0 at every iteration. Here is the stats of one iteration: json_stats: {"eta": "3 days, 6:51:17", "fl_fpn3": 0.000000, "fl_fpn4": 0.000000, "fl_fpn5": 0.000000, "fl_fpn6": 0.000000, "fl_fpn7": -58069.230469, "iter": 20, "loss": -58069.230469, "lr": 0.000360, "mb_qsize": 64, "mem": 12239, "retnet_bg_num": 6678044.000000, "retnet_fg_num": 395.500000, "retnet_loss_bbox_fpn3": 0.000000, "retnet_loss_bbox_fpn4": 0.000000, "retnet_loss_bbox_fpn5": 0.000000, "retnet_loss_bbox_fpn6": 0.000000, "retnet_loss_bbox_fpn7": 0.000000, "time": 1.577270} Any idea? How to resolve this issue?