facebookresearch / InterHand2.6M

Official PyTorch implementation of "InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image", ECCV 2020
Other
687 stars 91 forks source link

PyTorch -> ONNX -> Unity -" Only tensors of rank 4 or less are supported, but got rank 5" #25

Closed harmoniqpunk closed 3 years ago

harmoniqpunk commented 3 years ago

I'm trying to test this model in Unity Inference Engine. To do this I had to export it as ONNX format. I managed to export it as ONNX format but once I imported into Unity I got this error:

Asset import failed, "Assets/models/interhand.onnx" > OnnxImportException: Unexpected error while parsing layer 561 of type Reshape.
Only tensors of rank 4 or less are supported, but got rank 5

Json: { "input": [ "559", "560" ], "output": [ "561" ], "name": "Reshape_131", "opType": "Reshape" }
  at Unity.Barracuda.ONNXLayout.AxisPermutationsForMappingONNXLayoutToBarracuda (System.Int32 onnxRank, System.String onnxLayout) [0x003ef] in /Users/nautilus/Pose-Demo/Library/PackageCache/com.unity.barracuda@1.0.2/Barracuda/Editor/ONNXLayout.cs:152 
  at Unity.Barracuda.ONNXLayout.PermuteToBarracuda (System.Int64[] shape, System.String onnxLayout) [0x00003] in /Users/nautilus/Pose-Demo/Library/PackageCache/com.unity.barracuda@1.0.2/Barracuda/Editor/ONNXLayout.cs:158 
  at Unity.Barracuda.ONNXLayout.ConvertSymbolicShapeToBarracuda (System.Int64[] onnxShape, System.String onnxLayout) [0x00000] in /Users/nautilus/Pose-Demo/Library/PackageCache/com.unity.barracuda@1.0.2/Barracuda/Editor/ONNXLayout.cs:223 
  at Unity.Barracuda.ONNXModelImporter.<.ctor>b__14_1 (Unity.Barracuda.ModelBuilder net, Unity.Barracuda.ONNXNodeWrapper node) [0x000b9] in /Users/nautilus/Pose-Demo/Library/PackageCache/com.unity.barracuda@1.0.2/Barracuda/Editor/ONNXModelImporter.cs:105 
  at Unity.Barracuda.ONNXModelImporter.ConvertOnnxModel (Onnx.ModelProto onnxModel) [0x0032f] in /Users/nautilus/Pose-Demo/Library/PackageCache/com.unity.barracuda@1.0.2/Barracuda/Editor/ONNXModelImporter.cs:1088 

We are talking about these 2 Reshapes

Screenshot 2020-11-12 at 13 07 37

Any idea how can I do a workaround to those reshapes to use tensor rank 4 ?

The ONNX model can be downloaded from here: https://github.com/nauutilus/InterHand2.6M/releases/download/0.0.1/interhand.onnx

mks0601 commented 3 years ago

I think the reshape functions are from here and here. The view functions are used to obtain 3D heatmaps. I think you can change the line to something like this. .view(-1,self.joint_num,cfg.output_hm_shape[0]*cfg.output_hm_shape[1]*cfg.output_hm_shape[2]), which means reshaping to 1D heatmap. You should change 60-66th lines of main/model.py to the 1D heatmap version.

harmoniqpunk commented 3 years ago

Thank you for putting me on the right track.

So now I end up with a heatmap that has 3 dimensions instead of 5 (batch, joints, 1D merge of 3D joints rotations values)

I'm trying to extract the values from the merged 1D and I'm pretty stuck. I'm talking about these lines.

            val_z, idx_z = torch.max(joint_heatmap_out,2)
            val_zy, idx_zy = torch.max(val_z,2)
            val_zyx, joint_x = torch.max(val_zy,2)
            joint_x = joint_x[:,:,None]
            joint_y = torch.gather(idx_zy, 2, joint_x)
            joint_z = torch.gather(idx_z, 2, joint_y[:,:,:,None].repeat(1,1,1,cfg.output_hm_shape[1]))[:,:,0,:]
            joint_z = torch.gather(joint_z, 2, joint_x)

The size of joint_heatmap_out os now: [1, 42, 262144] Before was [1, 42, 64, 64, 64]

I was trying to chunk like this:

joint_heatmap_out_0, joint_heatmap_out_1, joint_heatmap_out_2 = torch.chunk(joint_heatmap_out, 3, 2)

But of course is not the right way to do it. I end up with these 3 tensors sizes:

[1, 42, 87382] [1, 42, 87382] [1, 42, 87380]

Also, I have a bit of hard time understanding joint_x, joint_y and joint_z how is constructed conceptually:

Here you select all indexes? What is None for? The selection format is the numpy one? This: [start_index : stop_index : step_size]

       ` joint_x = joint_x[:,:,None]`

Ok. This gather I understand, you select the y values by idx.

       ` joint_y = torch.gather(idx_zy, 2, joint_x)`

But I'm a bit lost at these 2 gathers because I do not understand the second line and neither the repeat from the first line.

            joint_z = torch.gather(idx_z, 2, joint_y[:,:,:,None].repeat(1,1,1,cfg.output_hm_shape[1]))[:,:,0,:]
            joint_z = torch.gather(joint_z, 2, joint_x)
mks0601 commented 3 years ago

I think you do not have to do such difficult techniques. First, reshape the heamtap to [batch_size, joint_num, ] (maybe [1,42,-1]). Second, perform argmax on the third axis. Let us denote the output of the argmax as i. There are 42 is. The z-axis coordinate of joint j can be obtained by `i // (cfg.output_hm_shape[1] cfg.output_hm_shape[2]). The y-axis coordinate of joint j can be obtained byi // cfg.output_hm_shape[2]. The x-axis coordinate of joint j can be obtained byi % cfg.output_hm_shape[2]`.

harmoniqpunk commented 3 years ago

I tried to wrap my had into understanding how this can give the z,y,x coordinates of a joint by implementing a small scale example:

Assuming:

cfg.output_hm_shape[0] = 64
cfg.output_hm_shape[1] = 64
cfg.output_hm_shape[2] = 64

So for the didactic purpose, I built a playground and reduced [1, 42, 64x64x64] to [1, 3, 2x2x2] and tried to implement what you suggesting me:

>>> x = torch.rand(1,3,2*2*2)
>>> print(x)
tensor([[[0.0390, 0.0894, 0.8253, 0.5204, 0.1177, 0.9195, 0.6775, 0.4527],
         [0.3900, 0.1801, 0.5663, 0.6324, 0.8562, 0.8361, 0.6462, 0.8877],
         [0.5187, 0.4421, 0.4709, 0.8864, 0.4110, 0.9353, 0.1569, 0.2525]]])
>>> print(torch.argmax(x, dim=2))
# Print i s (only 3 in this example instead of 42)
tensor([[5, 7, 5]])
# z-axis index i // (cfg.output_hm_shape[1] * cfg.output_hm_shape[2])
>>> print(5//(2*2))
1
# y-axis index i // cfg.output_hm_shape[2]
>>> print(5//2)
2
# x-axis index i % cfg.output_hm_shape[2]
>>> print(5%2)
1
>>> print(x[0][0][1])
tensor(0.0894)
>>> print(x[0][0][2])
tensor(0.8253)
>>> print(x[0][0][1])
tensor(0.0894)
>>> 

So in this example for i = 0 would we have:

joint_z = 0.0894
joint_y = 0.8253
joint_x = 0.0894

Did I correctly understand because for me doesn't make sense so I'm pretty sure I understanding wrong?

mks0601 commented 3 years ago
>>> x = torch.rand(1,3,2*2*2)
>>> print(x)
tensor([[[0.0390, 0.0894, 0.8253, 0.5204, 0.1177, 0.9195, 0.6775, 0.4527],
         [0.3900, 0.1801, 0.5663, 0.6324, 0.8562, 0.8361, 0.6462, 0.8877],
         [0.5187, 0.4421, 0.4709, 0.8864, 0.4110, 0.9353, 0.1569, 0.2525]]])
>>> print(torch.argmax(x, dim=2))
# Print i s (only 3 in this example instead of 42)
tensor([[5, 7, 5]])
# z-axis index i // (cfg.output_hm_shape[1] * cfg.output_hm_shape[2])
>>> print(5//(2*2))
1
# y-axis index i // cfg.output_hm_shape[2]
>>> print(5//2)
2
# x-axis index i % cfg.output_hm_shape[2]
>>> print(5%2)
1

so far, you have 3 joints (the second dimension of x=torch.rand(1,3,2*2*2) is 3) and for the first joint, the argmax result it 5. cfg.output_hm_shape = (2,2,2) (z-axis, y-axis, x-axis)

Then, z-axis index of the first joint is 5 // (cfg.output_hm_shape[1]cfg.output_hm_shape[2]) = 1 y-axis of the first joint is (5 % (cfg.output_hm_shape[1]cfg.output_hm_shape[2])) // cfg.output_hm_shape[2] = 0 (here, my prev. answer was wrong). x-axis of the first joint is (5 % (cfg.output_hm_shape[1]*cfg.output_hm_shape[2])) % cfg.output_hm_shape[2] = 1

Let's check argmax(x,2) == x[0][0][5] == x.view(1,3,2,2,2)[0][0][1][0][1]. Correct!

harmoniqpunk commented 3 years ago

Thank you! Now I got it.

I also ran a test and seems fine:

Evaluation start...
Handedness accuracy: 0.9914243936284379
MRRPE: 114.07134151194049

MPJPE for each joint: 
r_thumb4: 133.83, r_thumb3: 109.11, r_thumb2: 79.64, r_thumb1: 42.10, r_index4: 167.89, r_index3: 151.36, r_index2: 133.02, r_index1: 95.65, r_middle4: 162.94, r_middle3: 146.78, r_middle2: 130.91, r_middle1: 91.99, r_ring4: 154.24, r_ring3: 135.76, r_ring2: 121.68, r_ring1: 87.20, r_pinky4: 135.77, r_pinky3: 119.71, r_pinky2: 108.40, r_pinky1: 82.38, r_wrist: 0.00, l_thumb4: 133.43, l_thumb3: 108.00, l_thumb2: 78.67, l_thumb1: 41.39, l_index4: 165.56, l_index3: 148.98, l_index2: 131.27, l_index1: 94.26, l_middle4: 161.23, l_middle3: 145.90, l_middle2: 129.15, l_middle1: 90.69, l_ring4: 152.64, l_ring3: 134.29, l_ring2: 121.00, l_ring1: 86.36, l_pinky4: 131.61, l_pinky3: 119.18, l_pinky2: 107.63, l_pinky1: 81.45, l_wrist: 0.00, 
MPJPE for all hand sequences: 113.17

MPJPE for each joint: 
r_thumb4: 134.12, r_thumb3: 109.62, r_thumb2: 79.70, r_thumb1: 43.23, r_index4: 164.97, r_index3: 150.53, r_index2: 133.83, r_index1: 96.30, r_middle4: 166.01, r_middle3: 151.96, r_middle2: 133.64, r_middle1: 92.76, r_ring4: 156.42, r_ring3: 140.29, r_ring2: 124.46, r_ring1: 87.86, r_pinky4: 137.36, r_pinky3: 123.11, r_pinky2: 110.43, r_pinky1: 82.98, r_wrist: 0.00, l_thumb4: 133.40, l_thumb3: 108.17, l_thumb2: 78.90, l_thumb1: 42.27, l_index4: 161.54, l_index3: 147.21, l_index2: 131.30, l_index1: 94.86, l_middle4: 163.27, l_middle3: 149.75, l_middle2: 131.10, l_middle1: 90.87, l_ring4: 150.63, l_ring3: 138.01, l_ring2: 122.67, l_ring1: 86.63, l_pinky4: 133.72, l_pinky3: 121.56, l_pinky2: 108.83, l_pinky1: 81.83, l_wrist: 0.00, 
MPJPE for single hand sequences: 114.19

MPJPE for each joint: 
r_thumb4: 133.61, r_thumb3: 108.70, r_thumb2: 79.60, r_thumb1: 40.88, r_index4: 170.27, r_index3: 152.03, r_index2: 132.39, r_index1: 95.14, r_middle4: 160.19, r_middle3: 142.65, r_middle2: 128.78, r_middle1: 91.39, r_ring4: 152.30, r_ring3: 132.16, r_ring2: 119.51, r_ring1: 86.69, r_pinky4: 134.38, r_pinky3: 117.03, r_pinky2: 106.83, r_pinky1: 81.91, r_wrist: 0.00, l_thumb4: 133.45, l_thumb3: 107.85, l_thumb2: 78.49, l_thumb1: 40.39, l_index4: 168.81, l_index3: 150.41, l_index2: 131.24, l_index1: 93.78, l_middle4: 159.52, l_middle3: 142.77, l_middle2: 127.56, l_middle1: 90.54, l_ring4: 154.43, l_ring3: 131.31, l_ring2: 119.65, l_ring1: 86.14, l_pinky4: 129.85, l_pinky3: 117.25, l_pinky2: 106.67, l_pinky1: 81.14, l_wrist: 0.00, 
MPJPE for interacting hand sequences: 112.33

I solved like this:

            idx = torch.argmax(joint_heatmap_out, dim=2, keepdim=True)
            idx_z = idx // (cfg.output_hm_shape[1]*cfg.output_hm_shape[2])
            idx_y = idx % (cfg.output_hm_shape[1]*cfg.output_hm_shape[2]) // cfg.output_hm_shape[2]
            idx_x = idx % (cfg.output_hm_shape[1]*cfg.output_hm_shape[2]) % cfg.output_hm_shape[2]

            joint_z = torch.gather(joint_heatmap_out, dim=2, index=idx_z)
            joint_y = torch.gather(joint_heatmap_out, dim=2, index=idx_y)
            joint_x = torch.gather(joint_heatmap_out, dim=2, index=idx_x)

I successfully exported the ONNX model. Once I imported in Unity tadaaa:

OnnxImportException: Unknown type ArgMax encountered while parsing layer 592.

I open a ticket to Unity to ask them to add support for ArgMax. Until then I'll try to find a solution without using ArgMax. If you have any idea please help.

Thank you again for help.

mks0601 commented 3 years ago

Perfect. Let me know anything I can help.