akhilpm / DroneDetectron2

Pytorch code for our CVPRw 2023 paper "Cascaded Zoom-in Detector for High Resolution Aerial Images"
MIT License
52 stars 7 forks source link

enquiry about stronger detector #11

Closed twangnh closed 1 year ago

twangnh commented 1 year ago

@akhilpm Hi! thanks for sharing the great work! I'm wondering you have tried using Cascaded RCNN for the detector to achieve a stronger performance? and I note there were some prior works like FOCUS-AND-DETECT and AdaZoom that perform very good on Visdrone, have you compared to them and could you please give some comments?

akhilpm commented 1 year ago

Hello @twangnh, no I used Faster RCNN in the two-stage family and FCOS in the one-stage family. Code is also published for the same. But if you want to use Cascaded RCNN, I think it will be easy to adapt my code just the way detectron2 inherits the "GeneralizedRCNN" and make the "CascadedRCNN".

I remember coming across FOCUS-AND-DETECT, but it was using a ResNeXt101 backbone and no code is available to reproduce to a comparable backbone. It also uses deformable conv whereas we used the standard one. AdaZoom is new to me. It is around 1.8 points better than my recent FCOS train where I found the crops by running the crop labeling algorithm on the detection predictions instead of directly taking the crop predictions. This is slightly slower than the way described in the paper, but I got an mAP around 34.4. I would expect a bit more improvement if you use high resolution features "P2" of the FPN.

But in general, the fundamental problem we cited in the paper remains valid for these two you listed here as well. Adding external networks, complicated training, etc. We find the practitioners are still using uniform cropping or variants of it like SAHI. This we assume is because of their plug-and-play nature and no significant change in the standard detector training. So we aimed to bring the same simplicity with our proposed approach.