ZhaoJingjing713 / HPR

[CVPR 2024] Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective
MIT License
15 stars 2 forks source link

Hybrid Proposal Refiner

This is the official implementation of the paper "Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective".

Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective

Jinjing Zhao, Fangyun Wei, Chang Xu

The University of Sydney

TODO

Introduction

With the transformative impact of the Transformer, DETR pioneered the application of the encoder-decoder architecture to object detection. A collection of follow-up research, e.g., Deformable DETR, aims to enhance DETR while adhering to the encoder-decoder design. In this work, we revisit the DETR series through the lens of Faster R-CNN. We find that the DETR resonates with the underlying principles of Faster R-CNN's RPN-refiner design but benefits from end-to-end detection owing to the incorporation of Hungarian matching. We systematically adapt the Faster R-CNN towards the Deformable DETR, by integrating or repurposing each component of Deformable DETR, and note that Deformable DETR's improved performance over Faster R-CNN is attributed to the adoption of advanced modules such as a superior proposal refiner (e.g., deformable attention rather than RoI Align). When viewing the DETR through the RPN-refiner paradigm, we delve into various proposal refinement techniques such as deformable attention, cross attention, and dynamic convolution. These proposal refiners cooperate well with each other; thus, we synergistically combine them to establish a Hybrid Proposal Refiner (HPR). Our HPR is versatile and can be incorporated into various DETR detectors. For instance, by integrating HPR to a strong DETR detector, we achieve an AP of 54.9 on the COCO benchmark, utilizing a ResNet-50 backbone and a 36-epoch training schedule.

Main Results

Results on COCO with ResNet-50

Base Model Epoch w/LSJ AP Configs Checkpoints
Deformable DETR 12 50.6 config OneDrive | quark
Deformable DETR 24 51.9 config OneDrive | quark
DINO 12 51.1 config OneDrive | quark
DINO 24 51.9 config OneDrive | quark
Align DETR 12 52.1 config -
Align DETR 24 52.7 config -
Align DETR 12 52.7* config OneDrive | quark
Align DETR 24 54.6* config OneDrive | quark
Align DETR 36 **55.2*** config OneDrive | quark
DDQ 12 52.6* config OneDrive | quark
DDQ 24 53.3* config OneDrive | quark
DDQ 12 53.0 config OneDrive | quark
DDQ 24 54.8* config OneDrive | quark
DDQ 36 **55.1*** config OneDrive | quark

Results on COCO with Swin-Large

Base Model Epoch w/LSJ AP Configs Checkpoints
DDQ 12 58.7 config OneDrive | quark
DDQ 12 58.8* config OneDrive | quark
DDQ 24 59.7* config OneDrive | quark
Align DETR 12 58.6 config OneDrive | quark
Align DETR 24 59.3 config OneDrive | quark
Align DETR 12 58.8 config OneDrive | quark
Align DETR 24 59.6 config OneDrive | quark
Align DETR 36 60.0 config OneDrive | quark

* Retrained this configuration, the result is slightly higher than what we reported in the paper.

Installation

We test our models under python=3.10.10, pytorch=1.12.0,cuda=11.6. Other versions might be available as well.

  1. Install Pytorch and torchvision

Follow the instruction on https://pytorch.org/get-started/locally/.

# an example:
conda install -c pytorch pytorch torchvision
  1. Install other needed packages
    pip install -r requirements.txt

Data

Please download COCO 2017 dataset and organize them as following:

coco2017/
  ├── train2017/
  ├── val2017/
  └── annotations/
    ├── instances_train2017.json
    └── instances_val2017.json

Run

Modify COCO path in config file

Before training or evaluation, you need to modify the dataset path in following config files:

project/configs/_base_/datasets/data_re_aug_coco_detection.py
project/configs/_base_/datasets/lsj_data_re_aug_coco_detection.py

To train a model on single node

To accelerate convergence, we apply the SoCo pretrain on the ResNet-50 backbone (./backbone_pth/backbone.pth).

./dist_train.sh <Config Path> <GPU Number> <Work Dir>

To eval a model on single node

./dist_test.sh <Config Path> <Checkpoint Path> <GPU Number>

Multi-node training

You can refer to Deformable-DETR to enable training on multiple nodes.

Citation

If you use HPR in your research or wish to refer to the baseline results published here, please use the following BibTeX entry.

@inproceedings{zhao2024hybrid,
  title={Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective},
  author={Zhao, Jinjing and Wei, Fangyun and Xu, Chang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={17416--17426},
  year={2024}
}