This repository contains the code and models for the following paper.
Dual networks based 3D Multi-Person Pose Estimation from Monocular Video
Cheng Yu, Bo Wang, Robby T. Tan
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
Pytorch >= 1.5
Python >= 3.6
Create an enviroment.
conda create -n 3dmpp python=3.6
conda activate 3dmpp
Install the latest version of pytorch (tested on pytorch 1.5 - 1.7) based on your OS and GPU driver installed following install pytorch. For example, command to use on Linux with CUDA 11.0 is like:
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
Install dependencies
pip install -r requirements.txt
Build the Fast Gaussian Map tool:
cd lib/fastgaus
python setup.py build_ext --inplace
cd ../..
Download the pre-trained model and processed human keypoint files here, and unzip the downloaded zip file to this project's root directory, two folders are expected to see after doing that (i.e., ./ckpts
and ./mupots
).
MuPoTS eval set is needed to perform evaluation as the results reported in Table 3 in the main paper, which is available on the MuPoTS dataset website. You need to download the mupots-3d-eval.zip
file, unzip it, and run get_mupots-3d.sh
to download the dataset.
if you encounter an error like: /bin/bash: bad interpreter ...
, just launch:
sed -i 's/\r$//' get-mupots-3d.sh
After the download is complete, a MultiPersonTestSet.zip
is avaiable, ~5.6 GB. Unzip it and move the folder MultiPersonTestSet
to the root directory of the project to perform evaluation on MuPoTS test set. Now you should see the following directory structure.
${3D-Multi-Person-Pose_ROOT}
|-- ckpts <-- the downloaded pre-trained Models
|-- lib
|-- MultiPersonTestSet <-- the newly added MuPoTS eval set
|-- mupots <-- the downloaded processed human keypoint files
|-- util
|-- 3DMPP_framework.png
|-- calculate_mupots_btmup.py
|-- other python code, LICENSE, and README files
...
The following table is similar to Table 3 in the main paper, where the quantitative evaluations on MuPoTS-3D dataset are provided (best performance in bold). Evaluation instructions to reproduce the results (PCK and PCK_abs) are provided in the next section.
Group | Methods | PCK | PCK_abs |
---|---|---|---|
Person-centric (relative 3D pose) | Mehta et al., 3DV'18 | 65.0 | N/A |
Person-centric (relative 3D pose) | Rogez et al., IEEE TPAMI'19 | 70.6 | N/A |
Person-centric (relative 3D pose) | Mehta et al., ACM TOG'20 | 70.4 | N/A |
Person-centric (relative 3D pose) | Cheng et al., ICCV'19 | 74.6 | N/A |
Person-centric (relative 3D pose) | Cheng et al., AAAI'20 | 80.5 | N/A |
Camera-centric (absolute 3D pose) | Moon et al., ICCV'19 | 82.5 | 31.8 |
Camera-centric (absolute 3D pose) | Lin et al., ECCV'20 | 83.7 | 35.2 |
Camera-centric (absolute 3D pose) | Zhen et al., ECCV'20 | 80.5 | 38.7 |
Camera-centric (absolute 3D pose) | Li et al., ECCV'20 | 82.0 | 43.8 |
Camera-centric (absolute 3D pose) | Cheng et al., AAAI'21 | 87.5 | 45.7 |
Camera-centric (absolute 3D pose) | Our method | 89.6 | 48.0 |
We split the whole pipeline into several separate steps to make it more clear for the users.
python calculate_mupots_topdown_pts.py
python calculate_mupots_topdown_depth.py
python calculate_mupots_btmup.py
python calculate_mupots_integrate.py
Please note that python calculate_mupots_btmup.py
is going to take a while (30-40 minutes depending on your machine).
To evaluate the person-centric 3D multi-person pose estimation:
python eval_mupots_pck.py
After running the above code, the following PCK (person-centric, pelvis-based origin) value is expected, which matches the number reported in Table 3, PCK = 89 (percentage) in the paper.
...
Seq: 18
Seq: 19
Seq: 20
PCK_MEAN: 0.8923134794267524
Note: If procrustes analysis is used in eval_mupots_pck.py
, the obtained value is slightly different (PCK_MEAN: 0.8994453169938017).
To evaluate camera-centric (i.e., camera coordinates) 3D multi-person pose estimation:
python eval_mupots_pck_abs.py
After running the above code, the following PCK_abs (camera-centric) value is expected, which matches the number reported in Table 3, PCK_abs = 48 (percentage) in the paper.
...
Seq: 18
Seq: 19
Seq: 20
PCK_MEAN: 0.48030635566606195
Note: If procrustes analysis is used in eval_mupots_pck_abs.py
, the obtained value is slightly different (PCK_MEAN: 0.48514110933606175).
To run the code on your own video, it is needed to generate p2d, affpts, and affb (as defined here), which correspond to joints' location, joints' confidence, and bones' confidence.
The code is released under the MIT license. See LICENSE for details.
If this work is useful for your research, please cite the following papers.
@article{cheng2022dual,
title={Dual networks based 3D Multi-Person Pose Estimation from Monocular Video},
author={Cheng, Yu and Wang, Bo and Tan, Robby},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2022},
publisher={IEEE}
}
@InProceedings{Cheng_2021_CVPR,
author = {Cheng, Yu and Wang, Bo and Yang, Bo and Tan, Robby T.},
title = {Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021},
pages = {7649-7659}
}