Closed TowoC closed 2 years ago
We actually use the 90d output (for 15 out of 23 non-root joints, i.e., without the wrists/ankles/ankles/feet as they are not measured by imus). During evaluation, the ignored 8 joints' rotations are all set to identities, so we did not predict their rotations. In the paper, for conciseness, we said that we estimated all non-root joint rotations. Actually whether estimating these 8 joints' rotations does not affect the results much. (Also, the DIP-IMU training data does not contain these 8 joints' rotations)
For the loss, we directly use L2 on the root-relative 6D representation, i.e., just the network output, not converting to rotation matrices or global poses. It's a very simple implementation. You can try different losses.
Thanks for reply Xinyu,
I have two question here.
1) As you said, the output of Pose-S3 is actually 15d except L R ankle, foot, wrist, hand. So when we calculate the loss in this layer, it just use the 15 point to calculate the loss with groundtruth. Due to the DIP-IMU dataset did not have these information for training and evaluation.
But I have a concern about this, the training dataset contain AMASS and partial DIP-IMU. And the AMASS have all joint information in the dataset. So you mean the Pose-S3 do not calculate the loss cause the DIP-IMU have no such information.
However, the AMASS have the information. And the network output is set to 90d. Does it mean that the network will not regress the other 8 joint rotation while training? And won't it lead the model have a poor performance reletivity?
And... I want to know that the how to get the ground truth in Pose-S3. Is it generate by using axis_angle_to_rotation_matrix to data['pose'] first and using rotation_matrix_to_r6d to get the 6d representation? Thanks a lot.
2) Another small question is about, you said "For the loss, we directly use L2 on the root-relative 6D representation, i.e., just the network output, not converting to rotation matrices or global poses."
As the code you released in net._reduced_glb_6d_to_full_local_mat, I think this function is used to transform the 15 joint rotation to 24 joint rotation with padding some fixed value. And I saw there is a function:global_to_local_pose there. According to the name of "_reduced_glb_6d_to_full_local_mat" and "global_to_local_pose", doesn't it means the output of Pose-S3 is global pose?
Thanks for replying me so many questions.
Hello Xinyu, Thanks for your explanation. I understand what things are going.
Hello Xinyu, I have a question about the dimension of Pose-S3 output. As mentioned in the paper,
The output of Pose-S3 is R6(j-1), and j is 24. So the output dimension is 138
But the in the code, net.py, The pose_s3 output dimension is joint_set.n_reduced * 6, which is 90.
I have no idea about it.
Can you help me to figure out why the dimension is it?
So, when I want to calculate the loss of Pose-S3. I need to use the func:_reduced_glb_6d_to_full_local_mat. to get the full joint rotation matrix. And use func:rotation_matrix_to_r6d, to get the full joint rotation in 6d, right?