Closed tangh closed 4 years ago
(1) You're right. For Faster R-CNN (IoU>0.7), SSD (IoU>0.5) etc, the anchor boxes without overlap have no loss. But for YOLO, it chooses the anchor which has the maximum IoU as positive sample. Perhaps this situation will happen.
The above is just one of the reason why we choose G/D/CIoU loss, just for stability.
(2) The convergence speed is different. From fig. 4 in our paper https://arxiv.org/pdf/1911.08287.pdf, you can see DIoU converge well in everywhere. However, IoU and GIoU do not well converge in some orientations.
(3) The regression usually takes place in the basin of fig. 4, where all these IoU-based loss converge well, which means that for Faster R-CNN, all these IoU-based loss will have similar performance.
(4) However, as you can see if IoU -->1, the normalized central point distance loss and aspect ratio loss will tend to 0 as well. But we decouple these geometric factors from it, which is equivalent to adding additional constraints, which reduces the difficulty of model learning. It is equivalent to telling the model that it will be better to follow these constraints.
compute_ciou
in lib/utils/net.py
Thanks for your detailed answer. I forget YOLO has a unique strategy that might cause this non-overlap happening.
But for the second Q I still confused. In compute_ciou
-> def bbox_transform
-> dw = torch.clamp(dw, max=cfg.BBOX_XFORM_CLIP)
, the comment of BBOX_XFORM_CLIP
config (in config.py line 955) imply pred
is for reg value. But the next few lines like pred_ctr_x = dx
(not pred_offset_x = dx
), pred_w = torch.exp(dw)
(not pred_w = orginal_w * torch.exp(dw)
) imply pred
is the four box xywh
vaules.
Besides, I checked the bbox_target
and it shows it is regression values instead of bbox xywh.
So... I'm not figure it out.
Heres another question, Fig.4 in https://arxiv.org/pdf/1911.08287.pdf. In this fig, every coordinate n is the postion of a set of anchors before regression process (t=0) or after that (t=T)?
Thank you!
Hello, I have two questions about the
reg
prediction and targets:Since I found the loss calculation is done by https://github.com/Zzh-tju/DIoU-pytorch-detectron/blob/6e18f2c9f80c995e8730605de5aaabfa346e88d0/lib/modeling/model_builder.py#L199-L201
and
rpn_ret['bbox_targets']
should be prepared by https://github.com/Zzh-tju/DIoU-pytorch-detectron/blob/6e18f2c9f80c995e8730605de5aaabfa346e88d0/lib/roi_data/fast_rcnn.py#L129-L191That's to say, in the first, we match all predicted boxes to GT boxes, using a regular IoU metric. If the max IoU for one predicted box is greater than a threshold (for example
0.7
), this box is considered as positive, thus will have a reg loss. For negative ones, there should not have a reg loss.So, DIoU/GIoU are trying to solve a problem when using regular IoU as loss function, that is a box have no overlap with the corresponding GT will have no gradient. But should this kind of boxes be considered as negative thus no reg loss at all?
https://github.com/Zzh-tju/DIoU-pytorch-detectron/blob/6e18f2c9f80c995e8730605de5aaabfa346e88d0/lib/roi_data/fast_rcnn.py#L138
bbox_pred
bycls_score, bbox_pred = self.Box_Outs(box_feat)
is the regression values or box coordinates?https://github.com/Zzh-tju/DIoU-pytorch-detectron/blob/6e18f2c9f80c995e8730605de5aaabfa346e88d0/lib/modeling/model_builder.py#L176
Due to
rpn_ret['bbox_targets']
is the regression values from predicted to GT:https://github.com/Zzh-tju/DIoU-pytorch-detectron/blob/6e18f2c9f80c995e8730605de5aaabfa346e88d0/lib/roi_data/fast_rcnn.py#L170-L171 https://github.com/Zzh-tju/DIoU-pytorch-detectron/blob/6e18f2c9f80c995e8730605de5aaabfa346e88d0/lib/roi_data/fast_rcnn.py#L206-L214
And the
smooth L1
in original code is add between predicted reg values and true reg values. I thinkbbox_pred
andbbox_targets
should also be reg value here. But when computing GIoU/DIoU, absolute box coordinates should be used.https://github.com/Zzh-tju/DIoU-pytorch-detectron/blob/6e18f2c9f80c995e8730605de5aaabfa346e88d0/lib/modeling/fast_rcnn_heads.py#L50-L63
line60
andline62
using the samebbox_pred, bbox_targets
to compute the loss. So, do I missing any code that transform the regresstion values to box coordinates?Thank you so much!