This is the official implementation of our paper:
Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection,
Hang Ye, Wentao Zhu, Chunyu Wang, Rujie Wu, and Yizhou Wang
ECCV 2022
The overall framework of Faster-VoxelPose is presented below.
This project is developed using python 3.8, PyTorch 1.12.0, CUDA 11.3 (not necessary this version) on Ubuntu 16.04.
pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt
Following VoxelPose, we use the CMU Panoptic, Shelf and Campus datasets in our experiments.
bash scripts/download_panoptic.sh
bash scripts/download_shelf.sh
bash scripts/download_campus.sh
Due to incomplete annotations of the Shelf/Campus datasets, we synthesize extra data to provide training supervision for our 3D pose estimator on these two datasets. The pose sequences come from the Panoptic dataset. You need to download it (Google drive) and put it under the data/
directory.
Download the pretrained backbone model (ResNet-50 pretrained on COCO dataset and finetuned jointly on Panoptic dataset and MPII) for 2D heatmap estimation and place it under the backbone/
directory.
Note: As for the Shelf/Campus datasets, we directly test our model using 2D pose predictions from pre-trained Mask R-CNN on COCO Dataset. We've already included the annotations in the data/Campus
and data/Shelf
directory.
To generate 2D heatmap predictions, you need to resize the RGB images in the pre-processing step. You can run the following code to preprocess the dataset. The supported argument [DATASET_NAME]
includes Panoptic
, Shelf
and Campus
.
python preprocess.py --dataset [DATASET_NAME]
After downloading and pre-processing data, your directory tree should be like this:
${Project}
|-- data
|-- Panoptic
|-- 16060224_haggling1
| |-- hdImgs
| |-- hdvideos
| |-- hdPose3d_stage1_coco19
| |-- calibration_160224_haggling1.json
|-- 160226_haggling1
|-- ...
|-- Shelf
| |-- Camera0
| |-- ...
| |-- Camera4
| |-- actorsGT.mat
| |-- calibration_shelf.json
| |-- pred_shelf_maskrcnn_hrnet_coco.pkl
|-- Campus
| |-- Camera0
| |-- Camera1
| |-- Camera2
| |-- actorsGT.mat
| |-- calibration_campus.json
| |-- pred_campus_maskrcnn_hrnet_coco.pkl
|-- panoptic_training_pose.pkl
Every experiment is defined by config files. You can specify the path of the config file (e.g.configs/panoptic/jln64.yaml
) and run the following code to start training the model. Note that we only support single-GPU training now.
python run/train.py --cfg [CONFIG_FILE]
To train Faster-VoxelPose model on your own data, you need to follow the steps below:
Implement the code to process your own dataset under the lib/dataset/
directory. You can refer to lib/dataset/shelf.py
and rewrite the _get_db
and _get_cam
functions to take RGB images and camera params as input.
Modify the config file based on configs/shelf/jln64.yaml
. Remember to alter the TEST_HEATMAP_SRC
attribute to image
if no 2D predictions are given.
Start training the model and visualize the evaluation results.
To evaluate the model, specify the path of the config file. By default,
the model_best.pth.tar
checkpoint under the corresponding working directory
will be selected for evaluation. And the results will be printed on the screen.
python run/validate.py --cfg [CONFIG_FILE]
You can download our pre-trained checkpoint from Google Drive.
Dataset | MPJPE | AP25 | AP50 | AP100 | AP150 | Model weight | Config |
---|---|---|---|---|---|---|---|
Panoptic | 18.41 | 86.66 | 98.08 | 99.26 | 99.53 | Google drive | cfg |
Dataset | PCP3D | Model weight | Config |
---|---|---|---|
Shelf | 97.6 | Google drive | cfg |
Campus | 96.9 | Google drive | cfg |
Important Note: Our implementation is slightly different from the one proposed in the original paper. Through lots of experiments, considering the speed-performance tradeoffs, we remove the offset branch in HDN and retrain the models. We'll modify the paper and upload the final version on arXiv.
We also provide a demo demonstrating how to visualize results on your own sequences. Please refer to the ipynb file.
If you use our code or models in your research, please cite with:
@inproceedings{fastervoxelpose,
author={Ye, Hang and Zhu, Wentao and Wang, Chunyu and Wu, Rujie and Wang, Yizhou},
title={Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2022}
}
This repo is built on the excellent work VoxelPose. Thank the authors for releasing their codes.