This is the official implementation of the paper "DETRs with Hybrid Matching".
Authors: Ding Jia, Yuhui Yuan, Haodi He, Xiaopei Wu, Haojun Yu, Weihong Lin, Lei Sun, Chao Zhang, Han Hu
⛽ ⛽ ⛽ Contact: yuhui.yuan@microsoft.com
2023.04.14 Expedit-SAM significantly boosts the inference speed of the ViT-H SAM model by almost 1.5 times. 🍺credits to Weicong Liang🍺
2023.04.11 Swin-L+H-Deformable-DETR + SAM achieves strong COCO instance segmentation results: mask AP=46.8 by simply prompting SAM with our HDETR box predictions. (mask AP=46.5 based on ViTDet) 🍺credits to Zhanhao Liang🍺
2023.03.29 HDETR+R50 based on Detrex achieves better performance: AP=49.1 under 12x training epochs. Thanks to Tianhe Ren🍺
2023.03.22 Expedit-LargeScale-Vision-Transformer (NeurIPS2022) has been open-sourced.
2023.02.28 HDETR has been accepted by CVPR 2023 😉😉😉
2022.11.25 Optimized implementation for hybrid matching is released at pull-request, which parallelizes the matching/loss computations of one2one branch and one2many branch. 🍺credits to Ding Jia🍺
2022.11.17 Code for H-Detic-LVIS is released. 🍺credits to Haodi He🍺
2022.11.10 Code for H-TransTrack is released. 🍺credits to Haojun Yu🍺
2022.10.20 🎉🎉🎉Detrex have supported our H-Deformable-DETR 🍺credits to Ding Jia and Tianhe Ren🍺
2022.09.14 We have supported H-Deformable-DETR w/ ViT-L (MAE), which achieves 56.5 AP on COCO val with 4-scale feature maps without using LSJ (large scale jittering) adopted by the original ViT-Det. We will include the results of H-Deformable-DETR w/ ViT-L (MAE) + LSJ equipped with LSJ soon. 🍺credits to Weicong Liang🍺
2022.09.12 Our [H-Deformable-DETR w/ Swin-L]() achieves 58.2 AP on COCO val with 4-scale feature maps, thus achieving comparable (slightly better) results than the very recent DINO-DETR w/ Swin-L equipped with 4-scale feature maps.
2022.08.31 Code for H-Deformable-DETR-mmdet (support mmdetection2d 🍺credits toYiduo Hao🍺) is released. We will also release the code for H-Mask-Deformable-DETR soon (strong results on both instance segmentation and panoptic segmentation).
We provide a set of baseline results and trained models available for download:
Name | Backbone | query | epochs | AP | download |
---|---|---|---|---|---|
Deformable-DETR | R50 | 300 | 12 | 43.7 | model |
Deformable-DETR | R50 | 300 | 36 | 46.8 | model |
Deformable-DETR + tricks | R50 | 300 | 12 | 47.0 | model |
Deformable-DETR + tricks | R50 | 300 | 36 | 49.0 | model |
H-Deformable-DETR + tricks | R50 | 300 | 12 | 48.7 | model |
H-Deformable-DETR + tricks | R50 | 300 | 36 | 50.0 | model |
Name | Backbone | query | epochs | AP | download |
---|---|---|---|---|---|
Deformable-DETR | Swin Tiny | 300 | 12 | 45.3, 46.0 | model, model |
Deformable-DETR | Swin Tiny | 300 | 36 | 49.0,49.6 | model, model |
Deformable-DETR + tricks | Swin Tiny | 300 | 12 | 49.3 | model |
Deformable-DETR + tricks | Swin Tiny | 300 | 36 | 51.8 | model |
H-Deformable-DETR + tricks | Swin Tiny | 300 | 12 | 50.6 | model |
H-Deformable-DETR + tricks | Swin Tiny | 300 | 36 | 53.2 | model |
Deformable-DETR | Swin Large | 300 | 12 | 51.0 | model |
Deformable-DETR | Swin Large | 300 | 36 | 53.7 | model |
Deformable-DETR + tricks | Swin Large | 300 | 12 | 54.5 | model |
Deformable-DETR + tricks | Swin Large | 300 | 36 | 56.3 | model |
H-Deformable-DETR + tricks | Swin Large | 300 | 12 | 55.9 | model |
H-Deformable-DETR + tricks | Swin Large | 300 | 36 | 57.1 | model |
H-Deformable-DETR + tricks | Swin Large | 900 | 12 | 56.1 | model |
H-Deformable-DETR + tricks | Swin Large | 900 | 36 | 57.4 | model |
H-Deformable-DETR + tricks [topk=300] | Swin Large | 900 | 36 | 57.6 | model |
Name | Backbone | query | epochs | AP (weight-decay=0.0001) | AP (weight-decay=0.05 | download |
---|---|---|---|---|---|---|
H-Deformable-DETR + tricks | Swin Tiny | 300 | 12 | 50.6 | 51.2 | model |
H-Deformable-DETR + tricks | Swin Tiny | 300 | 36 | 53.2 | 53.7 | model |
H-Deformable-DETR + tricks | Swin Large | 900 | 36 | 57.4 | 57.8 | model |
H-Deformable-DETR + tricks [topk=300] | Swin Large | 900 | 36 | 57.6 | 57.9 | model |
H-Deformable-DETRdeep-encoder + tricks [topk=300] | Swin Large | 900 | 36 | NA | 58.2 | model |
We test our models under python=3.7.10,pytorch=1.10.1,cuda=10.2
. Other versions might be available as well.
Clone this repo
git https://github.com/HDETR/H-Deformable-DETR.git
cd H-Deformable-DETR
Install Pytorch and torchvision
Follow the instruction on https://pytorch.org/get-started/locally/.
# an example:
conda install -c pytorch pytorch torchvision
Install other needed packages
pip install -r requirements.txt
pip install openmim
mim install mmcv-full
pip install mmdet
Compiling CUDA operators
cd models/ops
python setup.py build install
# unit test (should see all checking is True)
python test.py
cd ../..
Please download COCO 2017 dataset and organize them as following:
coco_path/
├── train2017/
├── val2017/
└── annotations/
├── instances_train2017.json
└── instances_val2017.json
GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 <config path> \
--coco_path <coco path>
To train/eval a model with the swin transformer backbone, you need to download the backbone from the offical repo frist and specify argument--pretrained_backbone_path
like our configs.
GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 <config path> \
--coco_path <coco path> --eval --resume <checkpoint path>
You can refer to Deformable-DETR to enable training on multiple nodes.
If you find H-Deformable-DETR useful in your research, please consider citing:
@article{jia2022detrs,
title={DETRs with Hybrid Matching},
author={Jia, Ding and Yuan, Yuhui and He, Haodi and Wu, Xiaopei and Yu, Haojun and Lin, Weihong and Sun, Lei and Zhang, Chao and Hu, Han},
journal={arXiv preprint arXiv:2207.13080},
year={2022}
}
@article{zhu2020deformable,
title={Deformable detr: Deformable transformers for end-to-end object detection},
author={Zhu, Xizhou and Su, Weijie and Lu, Lewei and Li, Bin and Wang, Xiaogang and Dai, Jifeng},
journal={arXiv preprint arXiv:2010.04159},
year={2020}
}