garyzhao / SemGCN

The Pytorch implementation for "Semantic Graph Convolutional Networks for 3D Human Pose Regression" (CVPR 2019).
https://arxiv.org/abs/1904.03345
Apache License 2.0
460 stars 78 forks source link

Weird results when applying SemGCN to 2D pose from image #32

Open duckduck-sys opened 3 years ago

duckduck-sys commented 3 years ago

Inference on images in the wild using SemGCN has been partially covered in this thread and others, but only the overall process has been made clear. I.e.:

Below i will follow each step, using the test image of size 300x600 to the left.

original_image 2d_put

For Step 1, i use EfficientPose to generate the MPII format 2d pose of the test image as shown above on the right, here's the numeric output:

positions = [[[108. 512.]   # Right ankle
              [114. 428.]   # Right knee
              [124. 320.]   # Right hip
              [186. 324.]   # Left hip
              [178. 426.]   # Left knee
              [176. 512.]   # Left ankle
              [156. 322.]   # Pelvis
              [162. 152.]   # Thorax
              [164. 114.]   # Upper neck
              [166.  24.]   # Head top
              [ 60. 322.]   # Right wrist
              [ 78. 238.]   # Right elbow
              [ 96. 148.]   # Right shoulder
              [230. 154.]   # Left shoulder
              [240. 246.]   # Left elbow
              [224. 326.]]] # Left wrist

For Step 2, i run this:

positions = positions[:, SH_TO_GT_PERM, :]

To get the output:

positions = [[[156. 322.]
              [124. 320.]
              [114. 428.]
              [108. 512.]
              [186. 324.]
              [178. 426.]
              [176. 512.]
              [162. 152.]
              [164. 114.]
              [166.  24.]
              [230. 154.]
              [240. 246.]
              [224. 326.]
              [ 96. 148.]
              [ 78. 238.]
              [ 60. 322.]]]

For Step 3, i run this:

positions[..., :2] = normalize_screen_coordinates(positions[..., :2], w=300, h=600)

To get the output:

positions = [[[ 0.0399  0.1466 ]
              [-0.1733  0.1333 ]
              [-0.2400  0.8533 ]
              [-0.2799  1.4133 ]
              [ 0.2400  0.1600 ]
              [ 0.1866  0.8399 ]
              [ 0.1733  1.4133 ]
              [ 0.0800 -0.9866 ]
              [ 0.0933 -1.2400 ]
              [ 0.1066 -1.8400 ]
              [ 0.5333 -0.9733 ]
              [ 0.6000 -0.3600 ]
              [ 0.4933  0.1733 ]
              [-0.3600 -1.0133 ]
              [-0.4800 -0.4133 ]
              [-0.6000  0.1466 ]]]

For Step 4, the above is used as input to the SemGCN SH model running this:

inputs_2d = torch.from_numpy(positions)
inputs_2d = inputs_2d.to(device)
outputs_3d = model_pos(inputs_2d).cpu()
outputs = outputs_3d[:, :, :] - outputs_3d[:, :1, :]

Which gives the output:

outputs = [[[ 0.0000  0.0000  0.0000 ]
            [-0.0769 -0.6899 -0.2520 ]
            [ 0.0847 -0.4062 -0.0607 ]
            [ 0.4154  0.2318  0.4062 ]
            [ 0.2708 -0.5181 -0.0504 ]
            [ 0.3431 -0.7337  0.3018 ]
            [ 0.6379  0.6684  0.2033 ]
            [ 0.1650 -0.9141 -0.8496 ]
            [ 0.5825 -2.1341  0.2762 ]
            [ 1.1561 -1.5364 -0.6433 ]
            [ 1.1612 -1.1453 -0.2103 ]
            [ 0.9097 -0.6763  0.2361 ]
            [ 0.8202 -0.2971  0.2679 ]
            [ 0.8008 -1.1936 -0.1120 ]
            [ 0.2124 -1.3246  0.5563 ]
            [ 0.5093 -0.4762  0.3473 ]]]

When visualized this looks completely wrong... See image below. Can anyone highlight on where the problem lies? Is it a problem with the pre-processing, or with the model?

3d_output

develduan commented 3 years ago

@duckduck-sys I think there are two points to note about the data:

  1. location: the neck should be halfway between the shoulders, and the thorax should be roughly halfway between the neck and the hips.
  2. scale: normalization depends on the image width(mapping to [-1,1] based on the w), thus, the proportion of human body in the image needs to match H36M.
dandingol03 commented 3 years ago

@duckduck-sys I think there are two points to note about the data:

  1. location: the neck should be halfway between the shoulders, and the thorax should be roughly halfway between the neck and the hips.
  2. scale: normalization depends on the image width(mapping to [-1,1] based on the w), thus, the proportion of human body in the image needs to match H36M.
  • raw output pose_lifting_output271_unscale_raw
  • after scaling the locations pose_lifting_output271_raw
  • after modifying the location of the neck and the thorax pose_lifting_output271_modified

hi, how do you calculate the spine point

dandingol03 commented 3 years ago

hi @duckduck-sys 大神,你的数据归一化后有点奇怪,还有我想请教下您现在能正确回归出3d pose了吗,主要是hip和spine节点我不懂计算,还有就是hip节点要假定为(0,0)吗

lisa676 commented 3 years ago

@develduan Hi Duan, can you share this solution? I'm also facing somewhat same problem. Thanks

dandingol03 commented 3 years ago

@develduan Hi. I also face the same problem about how to figure out the spine point, because the stack-hourglass doesn't output the spine point

develduan commented 3 years ago

@lisa676 @dandingol03 Hi, I'm sorry that I stopped following this project because it didn't work very well on my dataset(the wild environment). In my dataset, all pedestrians stand upright, so I simply treated the midpoint of the neck and the pelvis as the thorax/spine: positions_mpii[i_thorax] = (positions_mpii[i_neck] + positions_mpii[i_pelvis]) / 2. After normalize_screen_coordinates, scale the locations with a factor to fit the proportion of human body in the image of H36M, in my case: positions = positions / 2.

In my case, I want to get the 3D posture directly from the image instead of getting a 2D posture and then a 3D posture, and I got a better result by following this paper "Angjoo Kanazawa, Michael J. Black, David W. Jacobs, Jitendra Malik. End-to-end Recovery of Human Shape and Pose".

dandingol03 commented 3 years ago

@develduan Firstly, thanks for your kindly apply. Secondly, the paper " End-to-end Recovery of Human Shape and Pose" is cool, i will delve into the paper soon. And last, here is my email dandingol03@outlook.com, maybe someday we can exchange idea about 3d pose estimation~