Weird results when applying SemGCN to 2D pose from image

duckduck-sys commented 3 years ago

Inference on images in the wild using SemGCN has been partially covered in this thread and others, but only the overall process has been made clear. I.e.:

Step 1: Use a 2D pose estimation network to generate 2D pose in MPII format.
Step 2: Convert 2D pose from MPII format to H36M format as done here.
Step 3: Pre-process the 2D input pose as done here.
Step 4: Use the pre-processed 2D pose in H36M format as input to the SemGCN SH model. It outputs 3D pose in H36M format.

Below i will follow each step, using the test image of size 300x600 to the left.

original_image 2d_put

For Step 1, i use EfficientPose to generate the MPII format 2d pose of the test image as shown above on the right, here's the numeric output:

positions = [[[108. 512.]   # Right ankle
              [114. 428.]   # Right knee
              [124. 320.]   # Right hip
              [186. 324.]   # Left hip
              [178. 426.]   # Left knee
              [176. 512.]   # Left ankle
              [156. 322.]   # Pelvis
              [162. 152.]   # Thorax
              [164. 114.]   # Upper neck
              [166.  24.]   # Head top
              [ 60. 322.]   # Right wrist
              [ 78. 238.]   # Right elbow
              [ 96. 148.]   # Right shoulder
              [230. 154.]   # Left shoulder
              [240. 246.]   # Left elbow
              [224. 326.]]] # Left wrist

For Step 2, i run this:

positions = positions[:, SH_TO_GT_PERM, :]

To get the output:

positions = [[[156. 322.]
              [124. 320.]
              [114. 428.]
              [108. 512.]
              [186. 324.]
              [178. 426.]
              [176. 512.]
              [162. 152.]
              [164. 114.]
              [166.  24.]
              [230. 154.]
              [240. 246.]
              [224. 326.]
              [ 96. 148.]
              [ 78. 238.]
              [ 60. 322.]]]

For Step 3, i run this:

positions[..., :2] = normalize_screen_coordinates(positions[..., :2], w=300, h=600)

To get the output:

positions = [[[ 0.0399  0.1466 ]
              [-0.1733  0.1333 ]
              [-0.2400  0.8533 ]
              [-0.2799  1.4133 ]
              [ 0.2400  0.1600 ]
              [ 0.1866  0.8399 ]
              [ 0.1733  1.4133 ]
              [ 0.0800 -0.9866 ]
              [ 0.0933 -1.2400 ]
              [ 0.1066 -1.8400 ]
              [ 0.5333 -0.9733 ]
              [ 0.6000 -0.3600 ]
              [ 0.4933  0.1733 ]
              [-0.3600 -1.0133 ]
              [-0.4800 -0.4133 ]
              [-0.6000  0.1466 ]]]

For Step 4, the above is used as input to the SemGCN SH model running this:

inputs_2d = torch.from_numpy(positions)
inputs_2d = inputs_2d.to(device)
outputs_3d = model_pos(inputs_2d).cpu()
outputs = outputs_3d[:, :, :] - outputs_3d[:, :1, :]

Which gives the output:

outputs = [[[ 0.0000  0.0000  0.0000 ]
            [-0.0769 -0.6899 -0.2520 ]
            [ 0.0847 -0.4062 -0.0607 ]
            [ 0.4154  0.2318  0.4062 ]
            [ 0.2708 -0.5181 -0.0504 ]
            [ 0.3431 -0.7337  0.3018 ]
            [ 0.6379  0.6684  0.2033 ]
            [ 0.1650 -0.9141 -0.8496 ]
            [ 0.5825 -2.1341  0.2762 ]
            [ 1.1561 -1.5364 -0.6433 ]
            [ 1.1612 -1.1453 -0.2103 ]
            [ 0.9097 -0.6763  0.2361 ]
            [ 0.8202 -0.2971  0.2679 ]
            [ 0.8008 -1.1936 -0.1120 ]
            [ 0.2124 -1.3246  0.5563 ]
            [ 0.5093 -0.4762  0.3473 ]]]

When visualized this looks completely wrong... See image below. Can anyone highlight on where the problem lies? Is it a problem with the pre-processing, or with the model?

3d_output

develduan commented 3 years ago

@duckduck-sys I think there are two points to note about the data:

location: the neck should be halfway between the shoulders, and the thorax should be roughly halfway between the neck and the hips.
scale: normalization depends on the image width(mapping to [-1,1] based on the w), thus, the proportion of human body in the image needs to match H36M.

raw output
after scaling the locations
after modifying the location of the neck and the thorax

dandingol03 commented 3 years ago

@duckduck-sys I think there are two points to note about the data:

location: the neck should be halfway between the shoulders, and the thorax should be roughly halfway between the neck and the hips.

scale: normalization depends on the image width(mapping to [-1,1] based on the w), thus, the proportion of human body in the image needs to match H36M.

raw output

after scaling the locations

after modifying the location of the neck and the thorax

hi, how do you calculate the spine point

dandingol03 commented 3 years ago

hi @duckduck-sys 大神，你的数据归一化后有点奇怪，还有我想请教下您现在能正确回归出3d pose了吗，主要是hip和spine节点我不懂计算，还有就是hip节点要假定为(0,0)吗

lisa676 commented 3 years ago

@develduan Hi Duan, can you share this solution? I'm also facing somewhat same problem. Thanks

dandingol03 commented 3 years ago

@develduan Hi. I also face the same problem about how to figure out the spine point, because the stack-hourglass doesn't output the spine point

develduan commented 3 years ago

@lisa676 @dandingol03 Hi, I'm sorry that I stopped following this project because it didn't work very well on my dataset(the wild environment). In my dataset, all pedestrians stand upright, so I simply treated the midpoint of the neck and the pelvis as the thorax/spine: positions_mpii[i_thorax] = (positions_mpii[i_neck] + positions_mpii[i_pelvis]) / 2. After normalize_screen_coordinates, scale the locations with a factor to fit the proportion of human body in the image of H36M, in my case: positions = positions / 2.

In my case, I want to get the 3D posture directly from the image instead of getting a 2D posture and then a 3D posture, and I got a better result by following this paper "Angjoo Kanazawa, Michael J. Black, David W. Jacobs, Jitendra Malik. End-to-end Recovery of Human Shape and Pose".

dandingol03 commented 3 years ago

@develduan Firstly, thanks for your kindly apply. Secondly, the paper " End-to-end Recovery of Human Shape and Pose" is cool, i will delve into the paper soon. And last, here is my email dandingol03@outlook.com, maybe someday we can exchange idea about 3d pose estimation~

garyzhao / SemGCN

Weird results when applying SemGCN to 2D pose from image #32