Zzh-tju / DIoU-pytorch-detectron

Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression (AAAI 2020)
GNU General Public License v3.0
88 stars 16 forks source link

bbox loss increases when using compute_ciou #7

Open ginobilinie opened 4 years ago

ginobilinie commented 4 years ago

Thanks to your great work.

I have called the compute_ciou function to generate the bbox loss,

self.bbox_loss = compute_ciou

_, bbox_loss = self.bbox_loss(bbox_pred, bbox_target, bbox_inside_weight, bbox_outside_weight, transform_weights=config.network.bbox_reg_weights)

However, I found the bbox_loss increases after training. I have checked the compute_ciou, I think it is the loss instead of ciou. Can you please provide some comments?

Zzh-tju commented 4 years ago

give more description of your problem, and your terminal output

ginobilinie commented 4 years ago

@Zzh-tju Thanks.

More description: I try to use the ciou/diou loss (just call the compute_ciou or compute_diou function to replace the original smooth_L1 loss) in the bbox regression (i donot use it in the rpn bbox regression) branch of maskrcnn.

Here is some output examples, we can see that the bbox loss is always becoming bigger and bigger even if it has gone to more than 1500 iterations.

020-05-16 13:58:04,031 | callback.py | line 40 : Batch [1120] Speed: 2.09 samples/sec Train-rpn_cls_loss=0.113447, rpn_bbox_loss=0.103411, rcnn_accuracy=0.965366, cls_loss=0.145091, bbox_loss=0.013169, mask_loss=0.421295 2020-05-16 13:58:32,343 | callback.py | line 40 : Batch [1140] Speed: 3.53 samples/sec Train-rpn_cls_loss=0.112570, rpn_bbox_loss=0.103330, rcnn_accuracy=0.965201, cls_loss=0.144857, bbox_loss=0.013289, mask_loss=0.420059 2020-05-16 13:59:00,609 | callback.py | line 40 : Batch [1160] Speed: 3.54 samples/sec Train-rpn_cls_loss=0.111402, rpn_bbox_loss=0.103031, rcnn_accuracy=0.965109, cls_loss=0.144442, bbox_loss=0.013391, mask_loss=0.418765 2020-05-16 13:59:28,599 | callback.py | line 40 : Batch [1180] Speed: 3.57 samples/sec Train-rpn_cls_loss=0.110362, rpn_bbox_loss=0.102688, rcnn_accuracy=0.965005, cls_loss=0.144045, bbox_loss=0.013488, mask_loss=0.417557 2020-05-16 13:59:57,988 | callback.py | line 40 : Batch [1200] Speed: 3.40 samples/sec Train-rpn_cls_loss=0.109447, rpn_bbox_loss=0.102511, rcnn_accuracy=0.964971, cls_loss=0.143690, bbox_loss=0.013563, mask_loss=0.416479, fcn_loss=2.902343, 2020-05-16 14:00:26,571 | callback.py | line 40 : Batch [1220] Speed: 3.50 samples/sec Train-rpn_cls_loss=0.108640, rpn_bbox_loss=0.102478, rcnn_accuracy=0.964896, cls_loss=0.143314, bbox_loss=0.013652, mask_loss=0.415290 2020-05-16 14:00:54,899 | callback.py | line 40 : Batch [1240] Speed: 3.53 samples/sec Train-rpn_cls_loss=0.108040, rpn_bbox_loss=0.102286, rcnn_accuracy=0.964789, cls_loss=0.143183, bbox_loss=0.013735, mask_loss=0.414629 2020-05-16 14:01:25,687 | callback.py | line 40 : Batch [1260] Speed: 3.25 samples/sec Train-rpn_cls_loss=0.107158, rpn_bbox_loss=0.101806, rcnn_accuracy=0.964729, cls_loss=0.142844, bbox_loss=0.013789, mask_loss=0.413583 2020-05-16 14:01:56,916 | callback.py | line 40 : Batch [1280] Speed: 3.20 samples/sec Train-rpn_cls_loss=0.106302, rpn_bbox_loss=0.101344, rcnn_accuracy=0.964675, cls_loss=0.142398, bbox_loss=0.013846, mask_loss=0.412258 2020-05-16 14:02:29,997 | callback.py | line 40 : Batch [1300] Speed: 3.02 samples/sec Train-rpn_cls_loss=0.105540, rpn_bbox_loss=0.101310, rcnn_accuracy=0.964535, cls_loss=0.142259, bbox_loss=0.013934, mask_loss=0.410907 2020-05-16 14:03:17,346 | callback.py | line 40 : Batch [1320] Speed: 2.11 samples/sec Train-rpn_cls_loss=0.104824, rpn_bbox_loss=0.101343, rcnn_accuracy=0.964492, cls_loss=0.141957, bbox_loss=0.013984, mask_loss=0.410117 2020-05-16 14:04:30,898 | callback.py | line 40 : Batch [1340] Speed: 1.36 samples/sec Train-rpn_cls_loss=0.104065, rpn_bbox_loss=0.100915, rcnn_accuracy=0.964418, cls_loss=0.141760, bbox_loss=0.014041, mask_loss=0.409114 2020-05-16 14:05:51,355 | callback.py | line 40 : Batch [1360] Speed: 1.24 samples/sec Train-rpn_cls_loss=0.103361, rpn_bbox_loss=0.100984, rcnn_accuracy=0.964272, cls_loss=0.141721, bbox_loss=0.014127, mask_loss=0.407994 2020-05-16 14:07:10,705 | callback.py | line 40 : Batch [1380] Speed: 1.26 samples/sec Train-rpn_cls_loss=0.102657, rpn_bbox_loss=0.100683, rcnn_accuracy=0.964276, cls_loss=0.141271, bbox_loss=0.014151, mask_loss=0.406887 2020-05-16 14:08:38,692 | callback.py | line 40 : Batch [1400] Speed: 1.14 samples/sec Train-rpn_cls_loss=0.101894, rpn_bbox_loss=0.100366, rcnn_accuracy=0.964249, cls_loss=0.140927, bbox_loss=0.014209, mask_loss=0.405810 2020-05-16 14:10:07,195 | callback.py | line 40 : Batch [1420] Speed: 1.13 samples/sec Train-rpn_cls_loss=0.101241, rpn_bbox_loss=0.100023, rcnn_accuracy=0.964270, cls_loss=0.140451, bbox_loss=0.014244, mask_loss=0.404589 2020-05-16 14:11:41,699 | callback.py | line 40 : Batch [1440] Speed: 1.06 samples/sec Train-rpn_cls_loss=0.100725, rpn_bbox_loss=0.100094, rcnn_accuracy=0.964232, cls_loss=0.140070, bbox_loss=0.014290, mask_loss=0.403361 2020-05-16 14:13:04,015 | callback.py | line 40 : Batch [1460] Speed: 1.21 samples/sec Train-rpn_cls_loss=0.100241, rpn_bbox_loss=0.100062, rcnn_accuracy=0.964128, cls_loss=0.139946, bbox_loss=0.014349, mask_loss=0.402280 2020-05-16 14:14:29,447 | callback.py | line 40 : Batch [1480] Speed: 1.17 samples/sec Train-rpn_cls_loss=0.099680, rpn_bbox_loss=0.100023, rcnn_accuracy=0.963948, cls_loss=0.140032, bbox_loss=0.014419, mask_loss=0.401260 2020-05-16 14:16:02,162 | callback.py | line 40 : Batch [1500] Speed: 1.08 samples/sec Train-rpn_cls_loss=0.099082, rpn_bbox_loss=0.100100, rcnn_accuracy=0.963881, cls_loss=0.139839, bbox_loss=0.014442, mask_loss=0.400566 .... 2020-05-16 14:48:40,384 | callback.py | line 40 : Batch [1940] Speed: 1.09 samples/sec Train-rpn_cls_loss=0.089056, rpn_bbox_loss=0.097449, rcnn_accuracy=0.962810, cls_loss=0.136322, bbox_loss=0.015455, mask_loss=0.383530 2020-05-16 14:50:11,303 | callback.py | line 40 : Batch [1960] Speed: 1.10 samples/sec Train-rpn_cls_loss=0.088642, rpn_bbox_loss=0.097169, rcnn_accuracy=0.962809, cls_loss=0.136030, bbox_loss=0.015470, mask_loss=0.382931

Zzh-tju commented 4 years ago

It seems that you are using the other detection repository. Train more iterations to see what happen. Did you just replace the loss function without any modification? If so, what's your regression loss weight?

ginobilinie commented 4 years ago

@Zzh-tju

Thanks.

Yes, I am using a MaskRCNN repository. When I train more iterations (currently 30k iterations), the loss does not increase but fixed at 0.019, however, it does not decreases, either.

The loss weight for the regression is set to 1.

Zzh-tju commented 4 years ago

In our experiment, the regression loss weight is set to 12 to balance with the classification loss. But judging from the above terminal output, it is obvious that the loss of classification and regression is very imbalanced.

ginobilinie commented 4 years ago

@Zzh-tju Thanks. I'll try to settle the balance issue.

ginobilinie commented 4 years ago

Hi, I have tried different weight for the ciou loss, however, the performance all decreased. Do I need to pay attention to some other hyper-parameters? Thanks.

Zzh-tju commented 4 years ago

more details will be good