Closed harmoniqpunk closed 3 years ago
I think the reshape functions are from here and here.
The view
functions are used to obtain 3D heatmaps. I think you can change the line to something like this.
.view(-1,self.joint_num,cfg.output_hm_shape[0]*cfg.output_hm_shape[1]*cfg.output_hm_shape[2])
, which means reshaping to 1D heatmap.
You should change 60-66th lines of main/model.py
to the 1D heatmap version.
Thank you for putting me on the right track.
So now I end up with a heatmap that has 3 dimensions instead of 5 (batch, joints, 1D merge of 3D joints rotations values)
I'm trying to extract the values from the merged 1D and I'm pretty stuck. I'm talking about these lines.
val_z, idx_z = torch.max(joint_heatmap_out,2)
val_zy, idx_zy = torch.max(val_z,2)
val_zyx, joint_x = torch.max(val_zy,2)
joint_x = joint_x[:,:,None]
joint_y = torch.gather(idx_zy, 2, joint_x)
joint_z = torch.gather(idx_z, 2, joint_y[:,:,:,None].repeat(1,1,1,cfg.output_hm_shape[1]))[:,:,0,:]
joint_z = torch.gather(joint_z, 2, joint_x)
The size of joint_heatmap_out os now: [1, 42, 262144] Before was [1, 42, 64, 64, 64]
I was trying to chunk like this:
joint_heatmap_out_0, joint_heatmap_out_1, joint_heatmap_out_2 = torch.chunk(joint_heatmap_out, 3, 2)
But of course is not the right way to do it. I end up with these 3 tensors sizes:
[1, 42, 87382] [1, 42, 87382] [1, 42, 87380]
Also, I have a bit of hard time understanding joint_x, joint_y and joint_z how is constructed conceptually:
Here you select all indexes? What is None for? The selection format is the numpy one? This: [start_index : stop_index : step_size]
` joint_x = joint_x[:,:,None]`
Ok. This gather I understand, you select the y values by idx.
` joint_y = torch.gather(idx_zy, 2, joint_x)`
But I'm a bit lost at these 2 gathers because I do not understand the second line and neither the repeat from the first line.
joint_z = torch.gather(idx_z, 2, joint_y[:,:,:,None].repeat(1,1,1,cfg.output_hm_shape[1]))[:,:,0,:]
joint_z = torch.gather(joint_z, 2, joint_x)
I think you do not have to do such difficult techniques.
First, reshape the heamtap to [batch_size, joint_num, ] (maybe [1,42,-1]).
Second, perform argmax on the third axis. Let us denote the output of the argmax as i
. There are 42 i
s.
The z-axis coordinate of joint j can be obtained by `i // (cfg.output_hm_shape[1] cfg.output_hm_shape[2]). The y-axis coordinate of joint j can be obtained by
i // cfg.output_hm_shape[2]. The x-axis coordinate of joint j can be obtained by
i % cfg.output_hm_shape[2]`.
I tried to wrap my had into understanding how this can give the z,y,x coordinates of a joint by implementing a small scale example:
Assuming:
cfg.output_hm_shape[0] = 64
cfg.output_hm_shape[1] = 64
cfg.output_hm_shape[2] = 64
So for the didactic purpose, I built a playground and reduced [1, 42, 64x64x64] to [1, 3, 2x2x2] and tried to implement what you suggesting me:
>>> x = torch.rand(1,3,2*2*2)
>>> print(x)
tensor([[[0.0390, 0.0894, 0.8253, 0.5204, 0.1177, 0.9195, 0.6775, 0.4527],
[0.3900, 0.1801, 0.5663, 0.6324, 0.8562, 0.8361, 0.6462, 0.8877],
[0.5187, 0.4421, 0.4709, 0.8864, 0.4110, 0.9353, 0.1569, 0.2525]]])
>>> print(torch.argmax(x, dim=2))
# Print i s (only 3 in this example instead of 42)
tensor([[5, 7, 5]])
# z-axis index i // (cfg.output_hm_shape[1] * cfg.output_hm_shape[2])
>>> print(5//(2*2))
1
# y-axis index i // cfg.output_hm_shape[2]
>>> print(5//2)
2
# x-axis index i % cfg.output_hm_shape[2]
>>> print(5%2)
1
>>> print(x[0][0][1])
tensor(0.0894)
>>> print(x[0][0][2])
tensor(0.8253)
>>> print(x[0][0][1])
tensor(0.0894)
>>>
So in this example for i = 0 would we have:
joint_z = 0.0894
joint_y = 0.8253
joint_x = 0.0894
Did I correctly understand because for me doesn't make sense so I'm pretty sure I understanding wrong?
>>> x = torch.rand(1,3,2*2*2)
>>> print(x)
tensor([[[0.0390, 0.0894, 0.8253, 0.5204, 0.1177, 0.9195, 0.6775, 0.4527],
[0.3900, 0.1801, 0.5663, 0.6324, 0.8562, 0.8361, 0.6462, 0.8877],
[0.5187, 0.4421, 0.4709, 0.8864, 0.4110, 0.9353, 0.1569, 0.2525]]])
>>> print(torch.argmax(x, dim=2))
# Print i s (only 3 in this example instead of 42)
tensor([[5, 7, 5]])
# z-axis index i // (cfg.output_hm_shape[1] * cfg.output_hm_shape[2])
>>> print(5//(2*2))
1
# y-axis index i // cfg.output_hm_shape[2]
>>> print(5//2)
2
# x-axis index i % cfg.output_hm_shape[2]
>>> print(5%2)
1
so far, you have 3 joints (the second dimension of x=torch.rand(1,3
,2*2*2
) is 3)
and for the first joint, the argmax result it 5.
cfg.output_hm_shape = (2,2,2) (z-axis, y-axis, x-axis)
Then, z-axis index of the first joint is 5 // (cfg.output_hm_shape[1]cfg.output_hm_shape[2]) = 1 y-axis of the first joint is (5 % (cfg.output_hm_shape[1]cfg.output_hm_shape[2])) // cfg.output_hm_shape[2] = 0 (here, my prev. answer was wrong). x-axis of the first joint is (5 % (cfg.output_hm_shape[1]*cfg.output_hm_shape[2])) % cfg.output_hm_shape[2] = 1
Let's check argmax(x,2) == x[0][0][5] == x.view(1,3,2,2,2)[0][0][1][0][1]. Correct!
Thank you! Now I got it.
I also ran a test and seems fine:
Evaluation start...
Handedness accuracy: 0.9914243936284379
MRRPE: 114.07134151194049
MPJPE for each joint:
r_thumb4: 133.83, r_thumb3: 109.11, r_thumb2: 79.64, r_thumb1: 42.10, r_index4: 167.89, r_index3: 151.36, r_index2: 133.02, r_index1: 95.65, r_middle4: 162.94, r_middle3: 146.78, r_middle2: 130.91, r_middle1: 91.99, r_ring4: 154.24, r_ring3: 135.76, r_ring2: 121.68, r_ring1: 87.20, r_pinky4: 135.77, r_pinky3: 119.71, r_pinky2: 108.40, r_pinky1: 82.38, r_wrist: 0.00, l_thumb4: 133.43, l_thumb3: 108.00, l_thumb2: 78.67, l_thumb1: 41.39, l_index4: 165.56, l_index3: 148.98, l_index2: 131.27, l_index1: 94.26, l_middle4: 161.23, l_middle3: 145.90, l_middle2: 129.15, l_middle1: 90.69, l_ring4: 152.64, l_ring3: 134.29, l_ring2: 121.00, l_ring1: 86.36, l_pinky4: 131.61, l_pinky3: 119.18, l_pinky2: 107.63, l_pinky1: 81.45, l_wrist: 0.00,
MPJPE for all hand sequences: 113.17
MPJPE for each joint:
r_thumb4: 134.12, r_thumb3: 109.62, r_thumb2: 79.70, r_thumb1: 43.23, r_index4: 164.97, r_index3: 150.53, r_index2: 133.83, r_index1: 96.30, r_middle4: 166.01, r_middle3: 151.96, r_middle2: 133.64, r_middle1: 92.76, r_ring4: 156.42, r_ring3: 140.29, r_ring2: 124.46, r_ring1: 87.86, r_pinky4: 137.36, r_pinky3: 123.11, r_pinky2: 110.43, r_pinky1: 82.98, r_wrist: 0.00, l_thumb4: 133.40, l_thumb3: 108.17, l_thumb2: 78.90, l_thumb1: 42.27, l_index4: 161.54, l_index3: 147.21, l_index2: 131.30, l_index1: 94.86, l_middle4: 163.27, l_middle3: 149.75, l_middle2: 131.10, l_middle1: 90.87, l_ring4: 150.63, l_ring3: 138.01, l_ring2: 122.67, l_ring1: 86.63, l_pinky4: 133.72, l_pinky3: 121.56, l_pinky2: 108.83, l_pinky1: 81.83, l_wrist: 0.00,
MPJPE for single hand sequences: 114.19
MPJPE for each joint:
r_thumb4: 133.61, r_thumb3: 108.70, r_thumb2: 79.60, r_thumb1: 40.88, r_index4: 170.27, r_index3: 152.03, r_index2: 132.39, r_index1: 95.14, r_middle4: 160.19, r_middle3: 142.65, r_middle2: 128.78, r_middle1: 91.39, r_ring4: 152.30, r_ring3: 132.16, r_ring2: 119.51, r_ring1: 86.69, r_pinky4: 134.38, r_pinky3: 117.03, r_pinky2: 106.83, r_pinky1: 81.91, r_wrist: 0.00, l_thumb4: 133.45, l_thumb3: 107.85, l_thumb2: 78.49, l_thumb1: 40.39, l_index4: 168.81, l_index3: 150.41, l_index2: 131.24, l_index1: 93.78, l_middle4: 159.52, l_middle3: 142.77, l_middle2: 127.56, l_middle1: 90.54, l_ring4: 154.43, l_ring3: 131.31, l_ring2: 119.65, l_ring1: 86.14, l_pinky4: 129.85, l_pinky3: 117.25, l_pinky2: 106.67, l_pinky1: 81.14, l_wrist: 0.00,
MPJPE for interacting hand sequences: 112.33
I solved like this:
idx = torch.argmax(joint_heatmap_out, dim=2, keepdim=True)
idx_z = idx // (cfg.output_hm_shape[1]*cfg.output_hm_shape[2])
idx_y = idx % (cfg.output_hm_shape[1]*cfg.output_hm_shape[2]) // cfg.output_hm_shape[2]
idx_x = idx % (cfg.output_hm_shape[1]*cfg.output_hm_shape[2]) % cfg.output_hm_shape[2]
joint_z = torch.gather(joint_heatmap_out, dim=2, index=idx_z)
joint_y = torch.gather(joint_heatmap_out, dim=2, index=idx_y)
joint_x = torch.gather(joint_heatmap_out, dim=2, index=idx_x)
I successfully exported the ONNX model. Once I imported in Unity tadaaa:
OnnxImportException: Unknown type ArgMax encountered while parsing layer 592.
I open a ticket to Unity to ask them to add support for ArgMax. Until then I'll try to find a solution without using ArgMax. If you have any idea please help.
Thank you again for help.
Perfect. Let me know anything I can help.
I'm trying to test this model in Unity Inference Engine. To do this I had to export it as ONNX format. I managed to export it as ONNX format but once I imported into Unity I got this error:
We are talking about these 2 Reshapes
Any idea how can I do a workaround to those reshapes to use tensor rank 4 ?
The ONNX model can be downloaded from here: https://github.com/nauutilus/InterHand2.6M/releases/download/0.0.1/interhand.onnx