Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation

This is the official pytorch implementation of our ICLR 2023 paper "Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation ".

⭐ ED-Pose

method We present ED-Pose, an end-to-end framework with Explicit box Detection for multi-person Pose estimation. ED-Pose re-considers this task as two explicit box detection processes with a unified representation and regression supervision. In general, ED-Pose is conceptually simple without post-processing and dense heatmap supervision.

For the first time, ED-Pose, as a fully end-to-end framework with a L1 regression loss, surpasses heatmap-based Top-down methods under the same backbone by 1.2 AP on COCO.
ED-Pose achieves the state-of-the-art with 76.6 AP on CrowdPose without test-time augmentation.

🔥 News

2023/08/08: 1. We support ED-Pose on the Human-Art dataset. 2. We upload the inference script for faster virtualization.

🐟 Todo

This repo contains further modifications including:

[ ] Integrated into detrex.
[ ] Integrated into Huggingface Spaces 🤗 using Gradio.

🚀 Model Zoo

We have put our model checkpoints here.

Results on COCO val2017 dataset

Model	Backbone	Lr schd	mAP	AP⁵⁰	AP⁷⁵	AP^M	AP^L	Time (ms)	Download
ED-Pose	R-50	60e	71.7	89.7	78.8	66.2	79.7	51	Google Drive
ED-Pose	Swin-L	60e	74.3	91.5	81.7	68.5	82.7	88	Google Drive
ED-Pose	Swin-L-5scale	60e	75.8	92.3	82.9	70.4	83.5	142	Google Drive

Results on CrowdPose test dataset

Model	Backbone	Lr schd	mAP	AP⁵⁰	AP⁷⁵	AP^E	AP^M	AP^H	Download
ED-Pose	R-50	80e	69.9	88.6	75.8	77.7	70.6	60.9	Google Drive
ED-Pose	Swin-L	80e	73.1	90.5	79.8	80.5	73.8	63.8	Google Drive
ED-Pose	Swin-L-5scale	80e	76.6	92.4	83.3	83.0	77.3	68.3	Google Drive

Results on COCO test-dev dataset

Model	Backbone	Loss	mAP	AP⁵⁰	AP⁷⁵	AP^M	AP^L
DirectPose	R-50	Reg	62.2	86.4	68.2	56.7	69.8
DirectPose	R-101	Reg	63.3	86.7	69.4	57.8	71.2
FCPose	R-50	Reg+HM	64.3	87.3	71.0	61.6	70.5
FCPose	R-101	Reg+HM	65.6	87.9	72.6	62.1	72.3
InsPose	R-50	Reg+HM	65.4	88.9	71.7	60.2	72.7
InsPose	R-101	Reg+HM	66.3	89.2	73.0	61.2	73.9
PETR	R-50	Reg+HM	67.6	89.8	75.3	61.6	76.0
PETR	Swin-L	Reg+HM	70.5	91.5	78.7	65.2	78.0
ED-Pose	R-50	Reg	69.8	90.2	77.2	64.3	77.4
ED-Pose	Swin-L	Reg	72.7	92.3	80.9	67.6	80.0

Results on COCO test-dev dataset

Results when joint-training using Human-Art and COCO datasets

🥂 Noted that training with Human-Art on ED-Pose can lead to a performance boost on MSCOCO!

Results on Human-Art validation set

Arch	Backbone	mAP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	Download
ED-Pose	ResNet-50	0.723	0.861	0.774	0.808	0.921	Google Drive

Results on COCO val2017

Arch	Backbone	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	Download
ED-Pose	ResNet-50	0.724	0.898	0.794	0.799	0.946	Google Drive

Note:

Any test-time augmentations is not used for ED-Pose.
We use the Object365 dataset to pretrain the human detection of ED-Pose under the Swin-L-5scale setting.

🚢 Environment Setup

Installation

We use the [DN-Deformable-DETR](https://arxiv.org/abs/2203.01305) as our codebase. We test our models under ```python=3.7.3,pytorch=1.9.0,cuda=11.1```. Other versions might be available as well. 1. Clone this repo ```sh git clone https://github.com/IDEA-Research/ED-Pose.git cd ED-Pose ``` 2. Install Pytorch and torchvision Follow the instruction on https://pytorch.org/get-started/locally/. ```sh # an example: conda install -c pytorch pytorch torchvision ``` 3. Install other needed packages ```sh pip install -r requirements.txt ``` 4. Compiling CUDA operators ```sh cd models/edpose/ops python setup.py build install # unit test (should see all checking is True) python test.py cd ../../.. ```

Data Preparation

**For COCO data**, please download from [COCO download](http://cocodataset.org/#download). The coco_dir should look like this: ``` |-- EDPose `-- |-- coco_dir `-- |-- annotations | |-- person_keypoints_train2017.json | `-- person_keypoints_val2017.json `-- images |-- train2017 | |-- 000000000009.jpg | |-- 000000000025.jpg | |-- 000000000030.jpg | |-- ... `-- val2017 |-- 000000000139.jpg |-- 000000000285.jpg |-- 000000000632.jpg |-- ... ``` **For CrowdPose data**, please download from [CrowdPose download](https://github.com/Jeff-sjtu/CrowdPose#dataset), The crowdpose_dir should look like this: ``` |-- ED-Pose `-- |-- crowdpose_dir `-- |-- json | |-- crowdpose_train.json | |-- crowdpose_val.json | |-- crowdpose_trainval.json (generated by util/crowdpose_concat_train_val.py) | `-- crowdpose_test.json `-- images |-- 100000.jpg |-- 100001.jpg |-- 100002.jpg |-- 100003.jpg |-- 100004.jpg |-- 100005.jpg |-- ... ```

🥳 Run

Training on COCO:

Single GPU

``` #For ResNet-50: export EDPOSE_COCO_PATH=/path/to/your/cocodir python main.py \ --output_dir "logs/coco_r50" \ -c config/edpose.cfg.py \ --options batch_size=4 epochs=60 lr_drop=55 num_body_points=17 backbone='resnet50' \ --dataset_file="coco" ``` ``` #For Swin-L: export EDPOSE_COCO_PATH=/path/to/your/cocodir export pretrain_model_path=/path/to/your/swin_L_384_22k python main.py \ --output_dir "logs/coco_swinl" \ -c config/edpose.cfg.py \ --options batch_size=4 epochs=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \ --dataset_file="coco" ```

Distributed Run

``` #For ResNet-50: export EDPOSE_COCO_PATH=/path/to/your/cocodir python -m torch.distributed.launch --nproc_per_node=4 main.py \ --output_dir "logs/coco_r50" \ -c config/edpose.cfg.py \ --options batch_size=4 epochs=60 lr_drop=55 num_body_points=17 backbone='resnet50' \ --dataset_file="coco" ``` ``` #For Swin-L: export EDPOSE_COCO_PATH=/path/to/your/cocodir export pretrain_model_path=/path/to/your/swin_L_384_22k python -m torch.distributed.launch --nproc_per_node=4 main.py \ --output_dir "logs/coco_swinl" \ -c config/edpose.cfg.py \ --options batch_size=4 epochs=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \ --dataset_file="coco" ```

Training on CrowdPose:

Single GPU

``` #For ResNet-50: export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir python main.py \ --output_dir "logs/crowdpose_r50" \ -c config/edpose.cfg.py \ --options batch_size=4 epochs=80 lr_drop=75 num_body_points=14 backbone='resnet50' \ --dataset_file="crowdpose" ``` ``` #For Swin-L: export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir export pretrain_model_path=/path/to/your/swin_L_384_22k python main.py \ --output_dir "logs/crowdpose_swinl" \ -c config/edpose.cfg.py \ --options batch_size=4 epochs=80 lr_drop=75 num_body_points=14 backbone='swin_L_384_22k' \ --dataset_file="crowdpose" ```

Distributed Run

``` #For ResNet-50: export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir python -m torch.distributed.launch --nproc_per_node=4 main.py \ --output_dir "logs/crowdpose_r50" \ -c config/edpose.cfg.py \ --options batch_size=4 epochs=80 lr_drop=75 num_body_points=14 backbone='resnet50' \ --dataset_file="crowdpose" ``` ``` #For Swin-L: export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir export pretrain_model_path=/path/to/your/swin_L_384_22k python -m torch.distributed.launch --nproc_per_node=4 main.py \ --output_dir "logs/crowdpose_swinl" \ -c config/edpose.cfg.py \ --options batch_size=4 epochs=80 lr_drop=75 num_body_points=14 backbone='swin_L_384_22k' \ --dataset_file="crowdpose" ```

We have put the Swin-L model pretrained on ImageNet-22k here.

Evaluation on COCO:

ResNet-50

``` export EDPOSE_COCO_PATH=/path/to/your/cocodir python -m torch.distributed.launch --nproc_per_node=4 main.py \ --output_dir "logs/coco_r50" \ -c config/edpose.cfg.py \ --options batch_size=4 epochs=60 lr_drop=55 num_body_points=17 backbone='resnet50' \ --dataset_file="coco" \ --pretrain_model_path "./models/edpose_r50_coco.pth" \ --eval ```

Swin-L

``` export EDPOSE_COCO_PATH=/path/to/your/cocodir export pretrain_model_path=/path/to/your/swin_L_384_22k python -m torch.distributed.launch --nproc_per_node=4 main.py \ --output_dir "logs/coco_swinl" \ -c config/edpose.cfg.py \ --options batch_size=4 epochs=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \ --dataset_file="coco" \ --pretrain_model_path "./models/edpose_swinl_coco.pth" \ --eval ```

Swin-L-5scale

``` export EDPOSE_COCO_PATH=/path/to/your/cocodir export pretrain_model_path=/path/to/your/swin_L_384_22k python -m torch.distributed.launch --nproc_per_node=4 main.py \ --output_dir "logs/coco_swinl" \ -c config/edpose.cfg.py \ --options batch_size=4 epochs=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \ return_interm_indices=0,1,2,3 num_feature_levels=5 \ --dataset_file="coco" \ --pretrain_model_path "./models/edpose_swinl_5scale_coco.pth" \ --eval ```

Evaluation on CrowdPose:

ResNet-50

``` export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir python main.py \ --output_dir "logs/crowdpose_r50" \ -c config/edpose.cfg.py \ --options batch_size=4 epochs=80 lr_drop=75 num_body_points=14 backbone='resnet50' \ --dataset_file="crowdpose"\ --pretrain_model_path "./models/edpose_r50_crowdpose.pth" \ --eval ```

Swin-L

``` export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir export pretrain_model_path=/path/to/your/swin_L_384_22k python main.py \ --output_dir "logs/crowdpose_swinl" \ -c config/edpose.cfg.py \ --options batch_size=4 epochs=80 lr_drop=75 num_body_points=14 backbone='swin_L_384_22k' \ --dataset_file="crowdpose" \ --pretrain_model_path "./models/edpose_swinl_crowdpose.pth" \ --eval ```

Swin-L-5scale

``` export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir export pretrain_model_path=/path/to/your/swin_L_384_22k python -m torch.distributed.launch --nproc_per_node=4 main.py \ --output_dir "logs/crowdpose_swinl" \ -c config/edpose.cfg.py \ --options batch_size=4 epochs=80 lr_drop=75 num_body_points=14 backbone='swin_L_384_22k' \ return_interm_indices=0,1,2,3 num_feature_levels=5 \ -- dataset_file="crowdpose" \ --pretrain_model_path "./models/edpose_swinl_5scale_crowdpose.pth" \ --eval ```

Virtualization via COCO Keypoints Format:

ResNet-50

``` export EDPOSE_COCO_PATH=/path/to/your/cocodir export Inference_Path=/path/to/your/inference_dir python -m torch.distributed.launch --nproc_per_node=1 main.py \ --output_dir "logs/coco_r50" \ -c config/edpose.cfg.py \ --options batch_size=1 epochs=60 lr_drop=55 num_body_points=17 backbone='resnet50' \ --dataset_file="coco" \ --pretrain_model_path "./models/edpose_r50_coco.pth" \ --eval ```

Swin-L

``` export EDPOSE_COCO_PATH=/path/to/your/cocodir export Inference_Path=/path/to/your/inference_dir python -m torch.distributed.launch --nproc_per_node=1 main.py \ --output_dir "logs/coco_swinl" \ -c config/edpose.cfg.py \ --options batch_size=1 epochs=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \ --dataset_file="coco" \ --pretrain_model_path "./models/edpose_swinl_coco.pth" \ --eval ```

Swin-L-5scale

``` export EDPOSE_COCO_PATH=/path/to/your/cocodir export Inference_Path=/path/to/your/inference_dir python -m torch.distributed.launch --nproc_per_node=1 main.py \ --output_dir "logs/coco_swinl" \ -c config/edpose.cfg.py \ --options batch_size=1 epochs=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \ return_interm_indices=0,1,2,3 num_feature_levels=5 \ --dataset_file="coco" \ --pretrain_model_path "./models/edpose_swinl_5scale_coco.pth" \ --eval ```

💃🏻 Cite ED-Pose

@inproceedings{
yang2023explicit,
title={Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation},
author={Jie Yang and Ailing Zeng and Shilong Liu and Feng Li and Ruimao Zhang and Lei Zhang},
booktitle={International Conference on Learning Representations},
year={2023},
url={https://openreview.net/forum?id=s4WVupnJjmX}
}

IDEA-Research / ED-Pose

readme

Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation

⭐ ED-Pose

🔥 News

🐟 Todo

🚀 Model Zoo

Results on COCO val2017 dataset

Results on CrowdPose test dataset

Results on COCO test-dev dataset

Results when joint-training using Human-Art and COCO datasets

🥂 Noted that training with Human-Art on ED-Pose can lead to a performance boost on MSCOCO!

Results on Human-Art validation set

Results on COCO val2017

Note:

🚢 Environment Setup

🥳 Run

Training on COCO:

Training on CrowdPose:

Evaluation on COCO:

Evaluation on CrowdPose:

Virtualization via COCO Keypoints Format:

💃🏻 Cite ED-Pose