Open ginobilinie opened 4 years ago
give more description of your problem, and your terminal output
@Zzh-tju Thanks.
More description: I try to use the ciou/diou loss (just call the compute_ciou or compute_diou function to replace the original smooth_L1 loss) in the bbox regression (i donot use it in the rpn bbox regression) branch of maskrcnn.
Here is some output examples, we can see that the bbox loss is always becoming bigger and bigger even if it has gone to more than 1500 iterations.
020-05-16 13:58:04,031 | callback.py | line 40 : Batch [1120] Speed: 2.09 samples/sec Train-rpn_cls_loss=0.113447, rpn_bbox_loss=0.103411, rcnn_accuracy=0.965366, cls_loss=0.145091, bbox_loss=0.013169, mask_loss=0.421295 2020-05-16 13:58:32,343 | callback.py | line 40 : Batch [1140] Speed: 3.53 samples/sec Train-rpn_cls_loss=0.112570, rpn_bbox_loss=0.103330, rcnn_accuracy=0.965201, cls_loss=0.144857, bbox_loss=0.013289, mask_loss=0.420059 2020-05-16 13:59:00,609 | callback.py | line 40 : Batch [1160] Speed: 3.54 samples/sec Train-rpn_cls_loss=0.111402, rpn_bbox_loss=0.103031, rcnn_accuracy=0.965109, cls_loss=0.144442, bbox_loss=0.013391, mask_loss=0.418765 2020-05-16 13:59:28,599 | callback.py | line 40 : Batch [1180] Speed: 3.57 samples/sec Train-rpn_cls_loss=0.110362, rpn_bbox_loss=0.102688, rcnn_accuracy=0.965005, cls_loss=0.144045, bbox_loss=0.013488, mask_loss=0.417557 2020-05-16 13:59:57,988 | callback.py | line 40 : Batch [1200] Speed: 3.40 samples/sec Train-rpn_cls_loss=0.109447, rpn_bbox_loss=0.102511, rcnn_accuracy=0.964971, cls_loss=0.143690, bbox_loss=0.013563, mask_loss=0.416479, fcn_loss=2.902343, 2020-05-16 14:00:26,571 | callback.py | line 40 : Batch [1220] Speed: 3.50 samples/sec Train-rpn_cls_loss=0.108640, rpn_bbox_loss=0.102478, rcnn_accuracy=0.964896, cls_loss=0.143314, bbox_loss=0.013652, mask_loss=0.415290 2020-05-16 14:00:54,899 | callback.py | line 40 : Batch [1240] Speed: 3.53 samples/sec Train-rpn_cls_loss=0.108040, rpn_bbox_loss=0.102286, rcnn_accuracy=0.964789, cls_loss=0.143183, bbox_loss=0.013735, mask_loss=0.414629 2020-05-16 14:01:25,687 | callback.py | line 40 : Batch [1260] Speed: 3.25 samples/sec Train-rpn_cls_loss=0.107158, rpn_bbox_loss=0.101806, rcnn_accuracy=0.964729, cls_loss=0.142844, bbox_loss=0.013789, mask_loss=0.413583 2020-05-16 14:01:56,916 | callback.py | line 40 : Batch [1280] Speed: 3.20 samples/sec Train-rpn_cls_loss=0.106302, rpn_bbox_loss=0.101344, rcnn_accuracy=0.964675, cls_loss=0.142398, bbox_loss=0.013846, mask_loss=0.412258 2020-05-16 14:02:29,997 | callback.py | line 40 : Batch [1300] Speed: 3.02 samples/sec Train-rpn_cls_loss=0.105540, rpn_bbox_loss=0.101310, rcnn_accuracy=0.964535, cls_loss=0.142259, bbox_loss=0.013934, mask_loss=0.410907 2020-05-16 14:03:17,346 | callback.py | line 40 : Batch [1320] Speed: 2.11 samples/sec Train-rpn_cls_loss=0.104824, rpn_bbox_loss=0.101343, rcnn_accuracy=0.964492, cls_loss=0.141957, bbox_loss=0.013984, mask_loss=0.410117 2020-05-16 14:04:30,898 | callback.py | line 40 : Batch [1340] Speed: 1.36 samples/sec Train-rpn_cls_loss=0.104065, rpn_bbox_loss=0.100915, rcnn_accuracy=0.964418, cls_loss=0.141760, bbox_loss=0.014041, mask_loss=0.409114 2020-05-16 14:05:51,355 | callback.py | line 40 : Batch [1360] Speed: 1.24 samples/sec Train-rpn_cls_loss=0.103361, rpn_bbox_loss=0.100984, rcnn_accuracy=0.964272, cls_loss=0.141721, bbox_loss=0.014127, mask_loss=0.407994 2020-05-16 14:07:10,705 | callback.py | line 40 : Batch [1380] Speed: 1.26 samples/sec Train-rpn_cls_loss=0.102657, rpn_bbox_loss=0.100683, rcnn_accuracy=0.964276, cls_loss=0.141271, bbox_loss=0.014151, mask_loss=0.406887 2020-05-16 14:08:38,692 | callback.py | line 40 : Batch [1400] Speed: 1.14 samples/sec Train-rpn_cls_loss=0.101894, rpn_bbox_loss=0.100366, rcnn_accuracy=0.964249, cls_loss=0.140927, bbox_loss=0.014209, mask_loss=0.405810 2020-05-16 14:10:07,195 | callback.py | line 40 : Batch [1420] Speed: 1.13 samples/sec Train-rpn_cls_loss=0.101241, rpn_bbox_loss=0.100023, rcnn_accuracy=0.964270, cls_loss=0.140451, bbox_loss=0.014244, mask_loss=0.404589 2020-05-16 14:11:41,699 | callback.py | line 40 : Batch [1440] Speed: 1.06 samples/sec Train-rpn_cls_loss=0.100725, rpn_bbox_loss=0.100094, rcnn_accuracy=0.964232, cls_loss=0.140070, bbox_loss=0.014290, mask_loss=0.403361 2020-05-16 14:13:04,015 | callback.py | line 40 : Batch [1460] Speed: 1.21 samples/sec Train-rpn_cls_loss=0.100241, rpn_bbox_loss=0.100062, rcnn_accuracy=0.964128, cls_loss=0.139946, bbox_loss=0.014349, mask_loss=0.402280 2020-05-16 14:14:29,447 | callback.py | line 40 : Batch [1480] Speed: 1.17 samples/sec Train-rpn_cls_loss=0.099680, rpn_bbox_loss=0.100023, rcnn_accuracy=0.963948, cls_loss=0.140032, bbox_loss=0.014419, mask_loss=0.401260 2020-05-16 14:16:02,162 | callback.py | line 40 : Batch [1500] Speed: 1.08 samples/sec Train-rpn_cls_loss=0.099082, rpn_bbox_loss=0.100100, rcnn_accuracy=0.963881, cls_loss=0.139839, bbox_loss=0.014442, mask_loss=0.400566 .... 2020-05-16 14:48:40,384 | callback.py | line 40 : Batch [1940] Speed: 1.09 samples/sec Train-rpn_cls_loss=0.089056, rpn_bbox_loss=0.097449, rcnn_accuracy=0.962810, cls_loss=0.136322, bbox_loss=0.015455, mask_loss=0.383530 2020-05-16 14:50:11,303 | callback.py | line 40 : Batch [1960] Speed: 1.10 samples/sec Train-rpn_cls_loss=0.088642, rpn_bbox_loss=0.097169, rcnn_accuracy=0.962809, cls_loss=0.136030, bbox_loss=0.015470, mask_loss=0.382931
It seems that you are using the other detection repository. Train more iterations to see what happen. Did you just replace the loss function without any modification? If so, what's your regression loss weight?
@Zzh-tju
Thanks.
Yes, I am using a MaskRCNN repository. When I train more iterations (currently 30k iterations), the loss does not increase but fixed at 0.019, however, it does not decreases, either.
The loss weight for the regression is set to 1.
In our experiment, the regression loss weight is set to 12 to balance with the classification loss. But judging from the above terminal output, it is obvious that the loss of classification and regression is very imbalanced.
@Zzh-tju Thanks. I'll try to settle the balance issue.
Hi, I have tried different weight for the ciou loss, however, the performance all decreased. Do I need to pay attention to some other hyper-parameters? Thanks.
more details will be good
Thanks to your great work.
I have called the compute_ciou function to generate the bbox loss,
self.bbox_loss = compute_ciou
_, bbox_loss = self.bbox_loss(bbox_pred, bbox_target, bbox_inside_weight, bbox_outside_weight, transform_weights=config.network.bbox_reg_weights)
However, I found the bbox_loss increases after training. I have checked the compute_ciou, I think it is the loss instead of ciou. Can you please provide some comments?