What is the Input format of the ckpt_semgcn_nonlocal_sh.pth.tar model ?

garyzhao / SemGCN

The Pytorch implementation for "Semantic Graph Convolutional Networks for 3D Human Pose Regression" (CVPR 2019).

https://arxiv.org/abs/1904.03345

Apache License 2.0

467 stars 78 forks source link

What is the Input format of the ckpt_semgcn_nonlocal_sh.pth.tar model ? #31

Closed duckduck-sys closed 4 years ago

duckduck-sys commented 4 years ago

The ckpt_semgcn_nonlocal_sh.pth.tar model outputs 3D poses in H36M format.

But it is not clear to me from the description how to generate a correct input from an "in the wild image".

Does it take 2D input poses in MPII format? (Stacked Hourglass)

Or does it take 2D input poses in (2D) H36M format?

If it is the latter, then how did you convert from MPII to 2D H36M format when training the ckpt_semgcn_nonlocal_sh.pth.tar model? Or did you train a special Stacked Hourglass model to output 2D H36M format directly?

garyzhao commented 4 years ago

Hi @duckduck-sys ,

It takes 2D input poses in H36M format.

You can use an original stacked hourglass model and convert its 2d predictions to H36M format by https://github.com/garyzhao/SemGCN/blob/master/data/prepare_data_2d_h36m_sh.py#L56

Best, Long

duckduck-sys commented 4 years ago

@garyzhao Thanks for the quick response!

So just to confirm, the correct inference process is as follows:

Step 0: Use a 2D pose estimation network such as Stacked Hourglass to generate a 2D pose in MPII format. Step 1: Convert 2D pose from MPII format to H36M format using approach described here Step 2: Pre-process the 2D input pose in some way. Step 3: Use the pre-processed 2D pose in H36M format as input to the ckpt_semgcn_nonlocal_sh.pth.tar model. Step 4: Output is a 3D pose in H36M format, visualize it.

Is my understanding correct? And is there any pre-processing involved in step 2, i.e. should the 2D pose be normalized in the pose bounding-box or?

garyzhao commented 4 years ago

Hi @duckduck-sys ,

Yep, it's correct.

2D poses are scaled according to the image resolution and normalized to [-1, 1].

See https://github.com/garyzhao/SemGCN/blob/master/common/data_utils.py#L17

Best, Long

duckduck-sys commented 4 years ago

Thank you @garyzhao for the instructions. The 3D output i get looks weird, but i think it's related to the pre-processing, so I will raise the question in a new post.