Vinkle Srivastav, Keqi Chen, Nicolas Padoy, CVPR 2024
*equal contribution
Sample output of SelfPose3d showing the inference on the CMU panoptic dataset video.
> conda create -n selfpose3d python=3.9
> conda activate selfpose3d
(selfpose3d)> conda install pytorch==1.13.1 torchvision==0.14.1 pytorch-cuda=11.7 -c pytorch -c nvidia
(selfpose3d)> pip install -r requirements.txt
${POSE_ROOT}/data/panoptic_toolbox/data
.
|-- Train
| |-- 160422_ultimatum1
| |-- 160224_haggling1
| |-- 160226_haggling1
| |-- 161202_haggling1
| |-- 160906_ian1
| |-- 160906_ian2
| |-- 160906_ian3
| |-- 160906_band1
| |-- 160906_band2
|-- Test
| |-- 160906_pizza1
| |-- 160422_haggling1
| |-- 160906_ian5
| |-- 160906_band4
./scripts/getData.sh
. The sequences and camera views used in our project can be obtained from our paper../scripts/getData.sh
../scripts/hdImgsExtractor.sh
.${POSE_ROOT}/models/pose_resnet_50_384x288.pth
. ${POSE_ROOT}/data/panoptic_toolbox/data
. Download the trained models and the pseudo 2d labels as follows. You can also follow pseudo_2d_labels_generation to generate the pseudo 2d labels.
> wget https://s3.unistra.fr/camma_public/github/selfpose3d/selfpose3d_models_pseudo_labels.zip
The directory tree should look like this:
${POSE_ROOT}
|-- models
| |-- pose_resnet_50_384x288.pth
| |-- cam5_rootnet_epoch2.pth.tar
| |-- cam5_posenet.pth.tar
| |-- backbone_epoch20.pth.tar
|-- data
|-- panoptic-toolbox
|-- data
|-- 160224_haggling1
| |-- hdImgs
| |-- hdvideos
| |-- hdPose3d_stage1_coco19
| |-- calibration_160224_haggling1.json
|-- 160226_haggling1
|-- ...
|-- group_train_cam5_pseudo_hrnet_soft_9videos.pkl
|-- group_train_cam5_pseudo_hrnet_hard_9videos.pkl
|-- group_validation_cam5_sub.pkl
Train and validate on the five selected camera views. You can specify the GPU devices, batch size per GPU and model path in the config file. We trained our models on two GPUs.
python -u tools/train_3d.py --cfg configs/panoptic_ssl/resnet50/backbone_pseudo_hrnet_soft_9videos.yaml
python -u tools/train_3d.py --cfg configs/panoptic_ssl/resnet50/cam5_rootnet.yaml
python -u tools/train_3d.py --cfg configs/panoptic_ssl/resnet50/cam5_posenet.yaml
python -u tools/train_3d.py --cfg configs/panoptic_ssl/resnet50/cam5_posenet_finetune.yaml
./lib/dataset/panoptic_ssv.py
and modify TRAIN_LIST. python -u tools/evaluate.py --cfg configs/panoptic_ssl/resnet50/cam5_posenet.yaml --with-ssv --test-file models/cam5_posenet.pth.tar
Currently, we do not support APIs for custom dataset training. If you want to use SelfPose3d on your own dataset, you need to do the following things:
./lib/dataset/
. config.MULTI_PERSON.SPACE_SIZE (The size of the 3d space)
config.MULTI_PERSON.SPACE_CENTER (The position of the 3d space center)
config.NETWORK.ROOTNET_SYN_RANGE (The relative space range to the 3d space center, where we generate the synthetic 3d root joints during root net training)
If you use our code or models in your research, please cite with:
@InProceedings{Srivastav_2024_CVPR,
author = {Srivastav, Vinkle and Chen, Keqi and Padoy, Nicolas},
title = {SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {2502-2512}
}
The project uses voxelpose-pytorch. We thank the authors of voxelpose for releasing the code. If you use voxelpose, consider citing it using the following BibTeX entry.
@inproceedings{voxelpose,
author={Tu, Hanyue and Wang, Chunyu and Zeng, Wenjun},
title={VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}
The project also leverages following research works. We thank the authors for releasing their codes.
This code and models are available for non-commercial scientific research purposes as defined in the CC BY-NC-SA 4.0. By downloading and using this code you agree to the terms in the LICENSE. Third-party codes are subject to their respective licenses.