PeizeSun / SparseR-CNN

[CVPR2021, PAMI2023] End-to-End Object Detection with Learnable Proposal
MIT License
1.31k stars 187 forks source link

Why the optimization targets of refined bboxes is assigned by refined box instead of unrefined bbox? #100

Closed Mingfeng-Wang closed 2 years ago

Mingfeng-Wang commented 2 years ago

I found that loss target is the difference between the refined bboxes and gt. But the predicted offset is used to refine the previous stage bbox. The optimization targets of offsets should be the difference between the previous stage bbox and gt. This how it is working in the Faster RCNN.

Thanks for your reply in advance.

PeizeSun commented 2 years ago

Hi~ The loss target is the difference between the refined bboxes and gt, where the refined bboxes = previous stage bbox + offsets, so the offsets could also be optimized.

Mingfeng-Wang commented 2 years ago

@PeizeSun

The offset is used to be added by previous stage bbox. So loss target for the current stage is supposed to be the difference between the previous stage bbox and gt. As you said, we want to refine the previous stage bbox by the offsets. If the offsets become closer to the distance from previous stage bbox to the gt, the more accuracy bbox predictions we will have.

In addition, your gt assignment is based on the refined bbox instead of the previous stage bbox. But if the gt is the closest object to the unrefined bbox. The refinement should be easier for the model.

The reason why your assignment is based on the refined I think is: you assign the gt by the models interest? (If offsets naturally guide the unrefined bbox to the further gt, I call it as models interest. It is easier for models learning when following the models interest).