Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation
This is the official pytorch implementation of our ICLR 2023 paper "Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation ".
⭐ ED-Pose
We present ED-Pose, an end-to-end framework with Explicit box Detection for multi-person Pose estimation. ED-Pose re-considers this task as two explicit box detection processes with a unified representation and regression supervision.
In general, ED-Pose is conceptually simple without post-processing and dense heatmap supervision.
- For the first time, ED-Pose, as a fully end-to-end framework with a L1 regression loss, surpasses heatmap-based Top-down methods under the same backbone by 1.2 AP on COCO.
- ED-Pose achieves the state-of-the-art with 76.6 AP on CrowdPose without test-time augmentation.
🔥 News
2023/08/08
: 1. We support ED-Pose on the Human-Art dataset. 2. We upload the inference script for faster virtualization.
🐟 Todo
This repo contains further modifications including:
🚀 Model Zoo
We have put our model checkpoints here.
Results on COCO val2017 dataset
Model |
Backbone |
Lr schd |
mAP |
AP50 |
AP75 |
APM |
APL |
Time (ms) |
Download |
ED-Pose |
R-50 |
60e |
71.7 |
89.7 |
78.8 |
66.2 |
79.7 |
51 |
Google Drive |
ED-Pose |
Swin-L |
60e |
74.3 |
91.5 |
81.7 |
68.5 |
82.7 |
88 |
Google Drive |
ED-Pose |
Swin-L-5scale |
60e |
75.8 |
92.3 |
82.9 |
70.4 |
83.5 |
142 |
Google Drive |
Results on CrowdPose test dataset
Model |
Backbone |
Lr schd |
mAP |
AP50 |
AP75 |
APE |
APM |
APH |
Download |
ED-Pose |
R-50 |
80e |
69.9 |
88.6 |
75.8 |
77.7 |
70.6 |
60.9 |
Google Drive |
ED-Pose |
Swin-L |
80e |
73.1 |
90.5 |
79.8 |
80.5 |
73.8 |
63.8 |
Google Drive |
ED-Pose |
Swin-L-5scale |
80e |
76.6 |
92.4 |
83.3 |
83.0 |
77.3 |
68.3 |
Google Drive |
Results on COCO test-dev dataset
Model |
Backbone |
Loss |
mAP |
AP50 |
AP75 |
APM |
APL |
DirectPose |
R-50 |
Reg |
62.2 |
86.4 |
68.2 |
56.7 |
69.8 |
DirectPose |
R-101 |
Reg |
63.3 |
86.7 |
69.4 |
57.8 |
71.2 |
FCPose |
R-50 |
Reg+HM |
64.3 |
87.3 |
71.0 |
61.6 |
70.5 |
FCPose |
R-101 |
Reg+HM |
65.6 |
87.9 |
72.6 |
62.1 |
72.3 |
InsPose |
R-50 |
Reg+HM |
65.4 |
88.9 |
71.7 |
60.2 |
72.7 |
InsPose |
R-101 |
Reg+HM |
66.3 |
89.2 |
73.0 |
61.2 |
73.9 |
PETR |
R-50 |
Reg+HM |
67.6 |
89.8 |
75.3 |
61.6 |
76.0 |
PETR |
Swin-L |
Reg+HM |
70.5 |
91.5 |
78.7 |
65.2 |
78.0 |
ED-Pose |
R-50 |
Reg |
69.8 |
90.2 |
77.2 |
64.3 |
77.4 |
ED-Pose |
Swin-L |
Reg |
72.7 |
92.3 |
80.9 |
67.6 |
80.0 |
Results on COCO test-dev dataset
Results when joint-training using Human-Art and COCO datasets
🥂 Noted that training with Human-Art on ED-Pose can lead to a performance boost on MSCOCO!
Results on Human-Art validation set
Arch |
Backbone |
mAP |
AP50 |
AP75 |
AR |
AR50 |
Download |
ED-Pose |
ResNet-50 |
0.723 |
0.861 |
0.774 |
0.808 |
0.921 |
Google Drive |
Results on COCO val2017
Arch |
Backbone |
AP |
AP50 |
AP75 |
AR |
AR50 |
Download |
ED-Pose |
ResNet-50 |
0.724 |
0.898 |
0.794 |
0.799 |
0.946 |
Google Drive |
Note:
- Any test-time augmentations is not used for ED-Pose.
- We use the Object365 dataset to pretrain the human detection of ED-Pose under the Swin-L-5scale setting.
🚢 Environment Setup
Installation
We use the [DN-Deformable-DETR](https://arxiv.org/abs/2203.01305) as our codebase. We test our models under ```python=3.7.3,pytorch=1.9.0,cuda=11.1```. Other versions might be available as well.
1. Clone this repo
```sh
git clone https://github.com/IDEA-Research/ED-Pose.git
cd ED-Pose
```
2. Install Pytorch and torchvision
Follow the instruction on https://pytorch.org/get-started/locally/.
```sh
# an example:
conda install -c pytorch pytorch torchvision
```
3. Install other needed packages
```sh
pip install -r requirements.txt
```
4. Compiling CUDA operators
```sh
cd models/edpose/ops
python setup.py build install
# unit test (should see all checking is True)
python test.py
cd ../../..
```
Data Preparation
**For COCO data**, please download from [COCO download](http://cocodataset.org/#download).
The coco_dir should look like this:
```
|-- EDPose
`-- |-- coco_dir
`-- |-- annotations
| |-- person_keypoints_train2017.json
| `-- person_keypoints_val2017.json
`-- images
|-- train2017
| |-- 000000000009.jpg
| |-- 000000000025.jpg
| |-- 000000000030.jpg
| |-- ...
`-- val2017
|-- 000000000139.jpg
|-- 000000000285.jpg
|-- 000000000632.jpg
|-- ...
```
**For CrowdPose data**, please download from [CrowdPose download](https://github.com/Jeff-sjtu/CrowdPose#dataset),
The crowdpose_dir should look like this:
```
|-- ED-Pose
`-- |-- crowdpose_dir
`-- |-- json
| |-- crowdpose_train.json
| |-- crowdpose_val.json
| |-- crowdpose_trainval.json (generated by util/crowdpose_concat_train_val.py)
| `-- crowdpose_test.json
`-- images
|-- 100000.jpg
|-- 100001.jpg
|-- 100002.jpg
|-- 100003.jpg
|-- 100004.jpg
|-- 100005.jpg
|-- ...
```
🥳 Run
Training on COCO:
Single GPU
```
#For ResNet-50:
export EDPOSE_COCO_PATH=/path/to/your/cocodir
python main.py \
--output_dir "logs/coco_r50" \
-c config/edpose.cfg.py \
--options batch_size=4 epochs=60 lr_drop=55 num_body_points=17 backbone='resnet50' \
--dataset_file="coco"
```
```
#For Swin-L:
export EDPOSE_COCO_PATH=/path/to/your/cocodir
export pretrain_model_path=/path/to/your/swin_L_384_22k
python main.py \
--output_dir "logs/coco_swinl" \
-c config/edpose.cfg.py \
--options batch_size=4 epochs=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \
--dataset_file="coco"
```
Distributed Run
```
#For ResNet-50:
export EDPOSE_COCO_PATH=/path/to/your/cocodir
python -m torch.distributed.launch --nproc_per_node=4 main.py \
--output_dir "logs/coco_r50" \
-c config/edpose.cfg.py \
--options batch_size=4 epochs=60 lr_drop=55 num_body_points=17 backbone='resnet50' \
--dataset_file="coco"
```
```
#For Swin-L:
export EDPOSE_COCO_PATH=/path/to/your/cocodir
export pretrain_model_path=/path/to/your/swin_L_384_22k
python -m torch.distributed.launch --nproc_per_node=4 main.py \
--output_dir "logs/coco_swinl" \
-c config/edpose.cfg.py \
--options batch_size=4 epochs=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \
--dataset_file="coco"
```
Training on CrowdPose:
Single GPU
```
#For ResNet-50:
export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
python main.py \
--output_dir "logs/crowdpose_r50" \
-c config/edpose.cfg.py \
--options batch_size=4 epochs=80 lr_drop=75 num_body_points=14 backbone='resnet50' \
--dataset_file="crowdpose"
```
```
#For Swin-L:
export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
export pretrain_model_path=/path/to/your/swin_L_384_22k
python main.py \
--output_dir "logs/crowdpose_swinl" \
-c config/edpose.cfg.py \
--options batch_size=4 epochs=80 lr_drop=75 num_body_points=14 backbone='swin_L_384_22k' \
--dataset_file="crowdpose"
```
Distributed Run
```
#For ResNet-50:
export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
python -m torch.distributed.launch --nproc_per_node=4 main.py \
--output_dir "logs/crowdpose_r50" \
-c config/edpose.cfg.py \
--options batch_size=4 epochs=80 lr_drop=75 num_body_points=14 backbone='resnet50' \
--dataset_file="crowdpose"
```
```
#For Swin-L:
export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
export pretrain_model_path=/path/to/your/swin_L_384_22k
python -m torch.distributed.launch --nproc_per_node=4 main.py \
--output_dir "logs/crowdpose_swinl" \
-c config/edpose.cfg.py \
--options batch_size=4 epochs=80 lr_drop=75 num_body_points=14 backbone='swin_L_384_22k' \
--dataset_file="crowdpose"
```
We have put the Swin-L model pretrained on ImageNet-22k here.
Evaluation on COCO:
ResNet-50
```
export EDPOSE_COCO_PATH=/path/to/your/cocodir
python -m torch.distributed.launch --nproc_per_node=4 main.py \
--output_dir "logs/coco_r50" \
-c config/edpose.cfg.py \
--options batch_size=4 epochs=60 lr_drop=55 num_body_points=17 backbone='resnet50' \
--dataset_file="coco" \
--pretrain_model_path "./models/edpose_r50_coco.pth" \
--eval
```
Swin-L
```
export EDPOSE_COCO_PATH=/path/to/your/cocodir
export pretrain_model_path=/path/to/your/swin_L_384_22k
python -m torch.distributed.launch --nproc_per_node=4 main.py \
--output_dir "logs/coco_swinl" \
-c config/edpose.cfg.py \
--options batch_size=4 epochs=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \
--dataset_file="coco" \
--pretrain_model_path "./models/edpose_swinl_coco.pth" \
--eval
```
Swin-L-5scale
```
export EDPOSE_COCO_PATH=/path/to/your/cocodir
export pretrain_model_path=/path/to/your/swin_L_384_22k
python -m torch.distributed.launch --nproc_per_node=4 main.py \
--output_dir "logs/coco_swinl" \
-c config/edpose.cfg.py \
--options batch_size=4 epochs=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \
return_interm_indices=0,1,2,3 num_feature_levels=5 \
--dataset_file="coco" \
--pretrain_model_path "./models/edpose_swinl_5scale_coco.pth" \
--eval
```
Evaluation on CrowdPose:
ResNet-50
```
export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
python main.py \
--output_dir "logs/crowdpose_r50" \
-c config/edpose.cfg.py \
--options batch_size=4 epochs=80 lr_drop=75 num_body_points=14 backbone='resnet50' \
--dataset_file="crowdpose"\
--pretrain_model_path "./models/edpose_r50_crowdpose.pth" \
--eval
```
Swin-L
```
export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
export pretrain_model_path=/path/to/your/swin_L_384_22k
python main.py \
--output_dir "logs/crowdpose_swinl" \
-c config/edpose.cfg.py \
--options batch_size=4 epochs=80 lr_drop=75 num_body_points=14 backbone='swin_L_384_22k' \
--dataset_file="crowdpose" \
--pretrain_model_path "./models/edpose_swinl_crowdpose.pth" \
--eval
```
Swin-L-5scale
```
export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
export pretrain_model_path=/path/to/your/swin_L_384_22k
python -m torch.distributed.launch --nproc_per_node=4 main.py \
--output_dir "logs/crowdpose_swinl" \
-c config/edpose.cfg.py \
--options batch_size=4 epochs=80 lr_drop=75 num_body_points=14 backbone='swin_L_384_22k' \
return_interm_indices=0,1,2,3 num_feature_levels=5 \
-- dataset_file="crowdpose" \
--pretrain_model_path "./models/edpose_swinl_5scale_crowdpose.pth" \
--eval
```
Virtualization via COCO Keypoints Format:
ResNet-50
```
export EDPOSE_COCO_PATH=/path/to/your/cocodir
export Inference_Path=/path/to/your/inference_dir
python -m torch.distributed.launch --nproc_per_node=1 main.py \
--output_dir "logs/coco_r50" \
-c config/edpose.cfg.py \
--options batch_size=1 epochs=60 lr_drop=55 num_body_points=17 backbone='resnet50' \
--dataset_file="coco" \
--pretrain_model_path "./models/edpose_r50_coco.pth" \
--eval
```
Swin-L
```
export EDPOSE_COCO_PATH=/path/to/your/cocodir
export Inference_Path=/path/to/your/inference_dir
python -m torch.distributed.launch --nproc_per_node=1 main.py \
--output_dir "logs/coco_swinl" \
-c config/edpose.cfg.py \
--options batch_size=1 epochs=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \
--dataset_file="coco" \
--pretrain_model_path "./models/edpose_swinl_coco.pth" \
--eval
```
Swin-L-5scale
```
export EDPOSE_COCO_PATH=/path/to/your/cocodir
export Inference_Path=/path/to/your/inference_dir
python -m torch.distributed.launch --nproc_per_node=1 main.py \
--output_dir "logs/coco_swinl" \
-c config/edpose.cfg.py \
--options batch_size=1 epochs=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \
return_interm_indices=0,1,2,3 num_feature_levels=5 \
--dataset_file="coco" \
--pretrain_model_path "./models/edpose_swinl_5scale_coco.pth" \
--eval
```
💃🏻 Cite ED-Pose
@inproceedings{
yang2023explicit,
title={Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation},
author={Jie Yang and Ailing Zeng and Shilong Liu and Feng Li and Ruimao Zhang and Lei Zhang},
booktitle={International Conference on Learning Representations},
year={2023},
url={https://openreview.net/forum?id=s4WVupnJjmX}
}