[WACV 2024] MotionAGFormer: Enhancing 3D Human Pose Estimation With a Transformer-GCNFormer Network

This is the official PyTorch implementation of the paper "MotionAGFormer: Enhancing 3D Human Pose Estimation With a Transformer-GCNFormer Network" (WACV 2024).

Environment

The project is developed under the following environment:

Python 3.8.10
PyTorch 2.0.0
CUDA 12.2

For installation of the project dependencies, please run:

pip install -r requirements.txt

Dataset

Human3.6M

Preprocessing

Download the fine-tuned Stacked Hourglass detections of MotionBERT's preprocessed H3.6M data here and unzip it to 'data/motion3d'.
Slice the motion clips by running the following python code in data/preprocess directory:

For MotionAGFormer-Base and MotionAGFormer-Large:

python h36m.py  --n-frames 243

For MotionAGFormer-Small:

python h36m.py --n-frames 81

For MotionAGFormer-XSmall:

python h36m.py --n-frames 27

Visualization

Run the following command in the data/preprocess directory (it expects 243 frames):

python visualize.py --dataset h36m --sequence-number <AN ARBITRARY NUMBER>

This should create a gif file named h36m_pose<SEQ_NUMBER>.gif within data directory.

MPI-INF-3DHP

Preprocessing

Please refer to P-STMO for dataset setup. After preprocessing, the generated .npz files (data_train_3dhp.npz and data_test_3dhp.npz) should be located at data/motion3d directory.

Visualization

Run it same as the visualization for Human3.6M, but --dataset should be set to mpi.

Training

After dataset preparation, you can train the model as follows:

Human3.6M

You can train Human3.6M with the following command:

python train.py --config <PATH-TO-CONFIG>

where config files are located at configs/h36m. You can also use weight and biases for logging the training and validation error by adding --use-wandb at the end. In case of using it, you can set the name using --wandb-name. e.g.:

python train.py --config configs/h36m/MotionAGFormer-base.yaml --use-wandb --wandb-name MotionAGFormer-base

MPI-INF-3DHP

You can train MPI-INF-3DHP with the following command:

python train_3dhp.py --config <PATH-TO-CONFIG>

where config files are located at configs/mpi. Like Human3.6M, weight and biases can be used.

Evaluation

Method	# frames	# Params	# MACs	H3.6M weights	MPI-INF-3DHP weights
MotionAGFormer-XS	27	2.2M	1.0G	download	download
MotionAGFormer-S	81	4.8M	6.6G	download	download
MotionAGFormer-B	243 \| 81	11.7M	48.3G \| 16G	download	download
MotionAGFormer-L	243 \| 81	19.0M	78.3G \| 26G	download	download

After downloading the weight from table above, you can evaluate Human3.6M models by:

python train.py --eval-only --checkpoint <CHECKPOINT-DIRECTORY> --checkpoint-file <CHECKPOINT-FILE-NAME> --config <PATH-TO-CONFIG>

For example if MotionAGFormer-L of H.36M is downloaded and put in checkpoint directory, then we can run:

python train.py --eval-only --checkpoint checkpoint --checkpoint-file motionagformer-l-h36m.pth.tr --config configs/h36m/MotionAGFormer-large.yaml

Similarly, MPI-INF-3DHP can be evaluated as follows:

python train_3dhp.py --eval-only --checkpoint <CHECKPOINT-DIRECTORY> --checkpoint-file <CHECKPOINT-FILE-NAME> --config <PATH-TO-CONFIG>

Demo

Our demo is a modified version of the one provided by MHFormer repository. First, you need to download YOLOv3 and HRNet pretrained models here and put it in the './demo/lib/checkpoint' directory. Next, download our base model checkpoint from here and put it in the './checkpoint' directory. Then, you need to put your in-the-wild videos in the './demo/video' directory.

Run the command below:

python demo/vis.py --video sample_video.mp4

Sample demo output:

Acknowledgement

Our code refers to the following repositories:

We thank the authors for releasing their codes.

Citation

If you find our work useful for your project, please consider citing the paper:

@inproceedings{motionagformer2024,
  title     =   {MotionAGFormer: Enhancing 3D Human Pose Estimation with a Transformer-GCNFormer Network}, 
  author    =   {Soroush Mehraban, Vida Adeli, Babak Taati},
  booktitle =   {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  year      =   {2024}
}

TaatiTeam / MotionAGFormer

readme

[WACV 2024] MotionAGFormer: Enhancing 3D Human Pose Estimation With a Transformer-GCNFormer Network

Environment

Dataset

Human3.6M

Preprocessing

Visualization

MPI-INF-3DHP

Preprocessing

Visualization

Training

Human3.6M

MPI-INF-3DHP

Evaluation

Demo

Acknowledgement

Citation