Zzh-tju / DIoU-pytorch-detectron

Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression (AAAI 2020)
GNU General Public License v3.0
88 stars 16 forks source link

Questions about reg prediction and targets #10

Closed tangh closed 4 years ago

tangh commented 4 years ago

Hello, I have two questions about the reg prediction and targets:

Since I found the loss calculation is done by https://github.com/Zzh-tju/DIoU-pytorch-detectron/blob/6e18f2c9f80c995e8730605de5aaabfa346e88d0/lib/modeling/model_builder.py#L199-L201

and rpn_ret['bbox_targets'] should be prepared by https://github.com/Zzh-tju/DIoU-pytorch-detectron/blob/6e18f2c9f80c995e8730605de5aaabfa346e88d0/lib/roi_data/fast_rcnn.py#L129-L191


  1. In which situation, two boxes with no overlap need to be compute the reg loss?

That's to say, in the first, we match all predicted boxes to GT boxes, using a regular IoU metric. If the max IoU for one predicted box is greater than a threshold (for example 0.7), this box is considered as positive, thus will have a reg loss. For negative ones, there should not have a reg loss.

So, DIoU/GIoU are trying to solve a problem when using regular IoU as loss function, that is a box have no overlap with the corresponding GT will have no gradient. But should this kind of boxes be considered as negative thus no reg loss at all?

https://github.com/Zzh-tju/DIoU-pytorch-detectron/blob/6e18f2c9f80c995e8730605de5aaabfa346e88d0/lib/roi_data/fast_rcnn.py#L138

  1. The network output bbox_pred by cls_score, bbox_pred = self.Box_Outs(box_feat) is the regression values or box coordinates?

https://github.com/Zzh-tju/DIoU-pytorch-detectron/blob/6e18f2c9f80c995e8730605de5aaabfa346e88d0/lib/modeling/model_builder.py#L176

Due to rpn_ret['bbox_targets'] is the regression values from predicted to GT:

https://github.com/Zzh-tju/DIoU-pytorch-detectron/blob/6e18f2c9f80c995e8730605de5aaabfa346e88d0/lib/roi_data/fast_rcnn.py#L170-L171 https://github.com/Zzh-tju/DIoU-pytorch-detectron/blob/6e18f2c9f80c995e8730605de5aaabfa346e88d0/lib/roi_data/fast_rcnn.py#L206-L214

And the smooth L1 in original code is add between predicted reg values and true reg values. I think bbox_pred and bbox_targets should also be reg value here. But when computing GIoU/DIoU, absolute box coordinates should be used.

https://github.com/Zzh-tju/DIoU-pytorch-detectron/blob/6e18f2c9f80c995e8730605de5aaabfa346e88d0/lib/modeling/fast_rcnn_heads.py#L50-L63

line60 and line62 using the same bbox_pred, bbox_targets to compute the loss. So, do I missing any code that transform the regresstion values to box coordinates?

Thank you so much!

Zzh-tju commented 4 years ago
  1. (1) You're right. For Faster R-CNN (IoU>0.7), SSD (IoU>0.5) etc, the anchor boxes without overlap have no loss. But for YOLO, it chooses the anchor which has the maximum IoU as positive sample. Perhaps this situation will happen.

The above is just one of the reason why we choose G/D/CIoU loss, just for stability.

(2) The convergence speed is different. From fig. 4 in our paper https://arxiv.org/pdf/1911.08287.pdf, you can see DIoU converge well in everywhere. However, IoU and GIoU do not well converge in some orientations.

(3) The regression usually takes place in the basin of fig. 4, where all these IoU-based loss converge well, which means that for Faster R-CNN, all these IoU-based loss will have similar performance.

(4) However, as you can see if IoU -->1, the normalized central point distance loss and aspect ratio loss will tend to 0 as well. But we decouple these geometric factors from it, which is equivalent to adding additional constraints, which reduces the difficulty of model learning. It is equivalent to telling the model that it will be better to follow these constraints.

  1. See the function compute_ciou in lib/utils/net.py
tangh commented 4 years ago

Thanks for your detailed answer. I forget YOLO has a unique strategy that might cause this non-overlap happening.

But for the second Q I still confused. In compute_ciou -> def bbox_transform -> dw = torch.clamp(dw, max=cfg.BBOX_XFORM_CLIP), the comment of BBOX_XFORM_CLIP config (in config.py line 955) imply pred is for reg value. But the next few lines like pred_ctr_x = dx (not pred_offset_x = dx), pred_w = torch.exp(dw) (not pred_w = orginal_w * torch.exp(dw)) imply pred is the four box xywh vaules. Besides, I checked the bbox_target and it shows it is regression values instead of bbox xywh. So... I'm not figure it out.


Heres another question, Fig.4 in https://arxiv.org/pdf/1911.08287.pdf. In this fig, every coordinate n is the postion of a set of anchors before regression process (t=0) or after that (t=T)?

Thank you!

Zzh-tju commented 4 years ago

https://github.com/Zzh-tju/DIoU-pytorch-detectron/blob/6e18f2c9f80c995e8730605de5aaabfa346e88d0/lib/utils/net.py#L85-L86

t=0