Inplementation of linear blend skinning

ChenYutongTHU commented 1 year ago

Hi. I have a problem about the implementation of linear blend skinning (LBS) for ZJU data in your code.

When preprocessing the data with its SMPL estimation, LBS is calculated as https://github.com/facebookresearch/tava/blob/a9576801e81aebcf242588be39315e27f915894e/tools/process_zju/lbs.py#L42-L54

A is the rest-to-pose transform matrix, shaped as (-1, 24, 4, 4) and W is the skinning weight matrix shaped as (-1, 6890, 24). Here, LBS aggregates 24 joints.

=======================================================

However, in TAVA, I found that LBS only considers 19 joints.

https://github.com/facebookresearch/tava/blob/a9576801e81aebcf242588be39315e27f915894e/tava/models/deform_posi_enc/snarf.py#L257-L262 https://github.com/facebookresearch/tava/blob/a9576801e81aebcf242588be39315e27f915894e/tava/models/deform_posi_enc/snarf.py#L280-L282

According to the rigid_cluster defined in zju_parser.py, the number of cluster is 19 and five joints (head, left-toes, right-toes, left-hand and right-hand) are missed. My problem is why there is a difference here. (24 vs 19)

I found that the preprocessing step produces "tfs" and "tf_bones". https://github.com/facebookresearch/tava/blob/a9576801e81aebcf242588be39315e27f915894e/tools/process_zju/main.py#L106-L117

However, TAVA uses 'tf_bones' instead of ''tfs'. 'tf_bones' and 'rest_tfs_bone' appear as 'bones_word' and 'bones_cano' in TAVA's code. The rest-to-pose transformation matrix is later computed as https://github.com/facebookresearch/tava/blob/a9576801e81aebcf242588be39315e27f915894e/tava/models/deform_posi_enc/snarf.py#L259

'tf_bones' have some redundancies because it represents the transform matrix of the head joint of each bone and some bones share the same head joints. This is why you need to reduce 24 to 19 later in the LBS.

Why don't you use 'tfs'? It has no redundancy and already describes the rest-to-pose transformation. More importantly, they can describe how the other five bones that are connected to head, left-toes, right-toes, left-hand and right-hand rigidly move

Please let me know if I misunderstand something here. Great thanks!

liruilong940607 commented 1 year ago

Hi,

For the question of 24 v.s. 19, the answer is there are some bones rigidly connected to each other. So even though there are 24 joints in total, the actual degree of freedom of SMPL is 19 matrices. So the choices of 19 allows 1/ a more compact LBS weights, 2/ the one-hot loss we used on the bones for regularization (24-dim LBS weights will introduce ambiguity to the one-hot loss because some dimensions have the same transformation matrix)

For the question of ‘tfs’ v.s. ‘tf_bones’. The ‘tfs’ can transform joints from rest to pose, but when you apply those matrices to bones you will find wrong transformations. The ‘tf_bones’ however, can be applied to both bones and joints, and give you the correct rest to pose transformation. Because we care deforming the entire space, rather than only joints, we choose to use the physically correct deformations ‘tf_bones’. Besides, only uses ‘tf_bones’ allows us to apply this one-hot loss to the bones.

As for the reason of this weird discrepancy, you would have to dig into the SMPLX code base, into the place where they transform the local rotation matrix to the global matrix. There is a weird ordering of applying the rotation and translation matrices in their code that cause this:

https://github.com/facebookresearch/tava/blob/a9576801e81aebcf242588be39315e27f915894e/tools/process_zju/lbs.py#L125

ChenYutongTHU commented 1 year ago

Great thanks! I have some follow-up questions

1. DoF: 24/23 or 19?

"So even though there are 24 joints in total, the actual degree of freedom of SMPL is 19 matrices." I checked the SMPL pose parameters (3*23). I did not observe a clear redundancy ... Take the 1st frame of CoreView_377 for example. I print the axis-angle of 24 joints, including the root which is always zero-rotation, as follows.

Joint-0 (parent:-)->root axis:[0. 0. 0.] angle:0.0 Joint-1 (parent:root)->lhip axis:[0.63 0.64 0.44] angle:0.2 Joint-2 (parent:root)->rhip axis:[ 0.34 -0.36 -0.87] angle:0.09 Joint-3 (parent:root)->belly axis:[-1. 0.01 0.02] angle:0.0 Joint-4 (parent:lhip)->lknee axis:[-0.17 0.74 -0.65] angle:0.16 Joint-5 (parent:rhip)->rknee axis:[ 0.97 -0.22 0.09] angle:0.11 Joint-6 (parent:belly)->spine axis:[ 0.96 -0.15 -0.23] angle:0.0 Joint-7 (parent:lknee)->lankle axis:[-0.56 0.82 0.14] angle:0.13 Joint-8 (parent:rknee)->rankle axis:[-0.98 -0.16 -0.12] angle:0.16 Joint-9 (parent:spine)->chest axis:[-0.99 0.14 0.05] angle:0.0 Joint-10 (parent:lankle)->ltoes axis:[ 0.81 -0.58 0.11] angle:0.0 Joint-11 (parent:rankle)->rtoes axis:[ 0.53 -0.1 0.84] angle:0.0 Joint-12 (parent:chest)->neck axis:[-0.87 0.05 0.48] angle:0.08 Joint-13 (parent:chest)->linshoulder axis:[0.01 1. 0.02] angle:0.0 Joint-14 (parent:chest)->rinshoulder axis:[ 0.03 -0.16 -0.99] angle:0.0 Joint-15 (parent:neck)->head axis:[ 0.55 0.75 -0.38] angle:0.19 Joint-16 (parent:linshoulder)->lshoulder axis:[ 0.09 0.12 -0.99] angle:0.83 Joint-17 (parent:rinshoulder)->rshoulder axis:[ 0.1 -0.11 0.99] angle:1.05 Joint-18 (parent:lshoulder)->lelbow axis:[-0.06 -0.99 0.12] angle:1.67 Joint-19 (parent:rshoulder)->relbow axis:[-0.09 0.99 -0.09] angle:1.55 Joint-20 (parent:lelbow)->lwrist axis:[ 0.5 -0.37 -0.78] angle:0.0 Joint-21 (parent:relbow)->rwrist axis:[-0.79 0.22 0.57] angle:0.0 Joint-22 (parent:lwrist)->lhand axis:[-0.96 -0. 0.28] angle:0.0 Joint-23 (parent:rwrist)->rhand axis:[-0.92 0.37 0.12] angle:0.0

I highlight Joint-1/2/3 and Joint-12/13/14 because these bones are assigned to two clusters and considered rigidly connected. However, as we can see here, in each cluster, the three axis-angle rotations are different and the difference is untrivial. For instance, 1 and 2 have very different axes and angles as

Joint-1 (parent:root)->lhip axis:[0.63 0.64 0.44] angle:0.2 Joint-2 (parent:root)->rhip axis:[ 0.34 -0.36 -0.87] angle:0.09

The rotation directions differ a lot.

This makes sense because the axis-angle here represents the rotation of the joint relative to its parent rather than that of the bone in which the joint is the tail. The rotation of the bone is decided by that of the head joint and the rotation of the tail joint subsequently decides that of the bone in which the tail joint becomes the head joint. (Sorry if my description is a bit convoluted). This is reflected in the preprocessing code as

https://github.com/facebookresearch/tava/blob/a9576801e81aebcf242588be39315e27f915894e/tools/process_zju/lbs.py#L152-L156 where the transformation of the kth bone is assigned with that of its head joint.

If my understanding is correct, then the axis-angle (3-dim) of the joint k (1<=k<=23) describes how those bones whose head joints are joint k rigidly transform. And all 23 axis-angle rotations function differently and have no redundancy. So the degree of freedom should be 23, excluding the root joint which physically cannot rotate.

2. The problematic transformation

I did notice your comment here :) https://github.com/facebookresearch/tava/blob/a9576801e81aebcf242588be39315e27f915894e/tools/process_zju/lbs.py#L125-L131

I conjecture that [I,T]@[R,0] is correct. My reason is as follows.

T is the 3d location of a joint relative to its parent. It defines how we translate from the parent joint to the child joint.
R, the matrix form of the axis-angle vector, represents the relative rotation of a joint relative to its parent.
The 4X4 matrix in 'transforms_mat' represents how you can transform a joint's x-y-z axis from its parent joint. It should apply R before T. So that when you apply the transform to (0,0,0) (the parent joint coordinate), it becomes [R,T]@(0,0,0,1)=T which is the correct coordinate in the parent's system. If we apply R after T, the transformed coordinates become [R,RT]@(0,0,0,1)=RT, which is not correct.

The fork here is whether to consider R as the joint's rotation or its head joint's rotation. If R represents its head joint's rotation relative to the head's head joint (that is also the bone's transformation), then it may be [R,0]@[I,T]=[R,RT] ...

I'm grateful that you would help me with this. Perhaps I misunderstand something fundamental😊

liruilong940607 commented 1 year ago

I think a simple question to ask is, if I need to transform a bone (all the points between head and tail) from rest to pose, what matrix would that be?

If you try to answer this question, you will find out you can only get 19 unique transformations. And you will see where all my illustrations stand.

ChenYutongTHU commented 1 year ago

https://github.com/facebookresearch/tava/blob/a9576801e81aebcf242588be39315e27f915894e/tools/process_zju/lbs.py#L148-L162

The 'rel_transforms' here, which later is named 'A' and 'tfs', still contain 24 unique transform matrices.

I think that reducing 24 to 19 discards the transformation of five other bones. The head joints of these five bones are head, ltoes, rtoes, lhand and rhand. But the five bones' tail joints are not named, and might not need to be defined because the rotations of these bones are already defined by their head joints' axis-angle.

We can actually see the five bones (or more accurately five parts) in the figure. They are the green head, the yellow left and white fingers, the red right toes, and the blue left toes.

Thanks for your illustrations. I see it now. I just feel that the original LBS implementation of SMPL-X also makes sense, in which the LBS weights are assigned to all 24 joints.

But I guess that this detail is truly unimportant because the five implicitly defined bones are quite short and small.

liruilong940607 commented 1 year ago

I’m afraid we are not on the same page.

First of all, the ‘rel_transforms’ (aka, ‘tfs’) , seems like the transformation on bones by looking at the code logic, but it’s NOT! If you actually play with it, it can’t transform the bones from rest to pose correctly. (The way to verify this is to focus on the center point of the bone in the rest pose and apply this transformation, and see if it goes to the center point of the corresponding point in the view space.)

Instead, L 151 - L 156 is where the “correct” bone transformations being calculated (aka ‘tf_bones’). These matrices can correctly transform any points along the bone to the correct corresponding point. And if you print out these matrices, you will find duplicates values. And only 19 of them are unique. That’s why I’m saying there are bones rigidly attached together. (Are we on the same page of there exists rigidly attached bones?)

As for the end joints, I’m confused because they are NOT the bones we disregard (23 vs 19). I’m not sure how these end joints come into this conversation. See here one how we cluster 23 bones into 19 unique transformations.

https://github.com/facebookresearch/tava/blob/a9576801e81aebcf242588be39315e27f915894e/tava/datasets/zju_parser.py#L69

I’m happy to future illustrate the reason behind this but before that I think we first need to be on the same page on these two facts: 1/ the ‘tfs’ can transform joints correctly, but not the points along the bones. 2/ the ‘tf_bones’ can correctly transform all bones ( e.g. the center of a bone), and have duplicated matrices.

ChenYutongTHU commented 1 year ago

Hi,

I see what you mean. Thanks for helping me sort it out! One last question before closing this issue~

As you said there is a redundancy in SMPL pose params and the problematic [R|T] transformation in SMPLX codebase, does that mean a bug of SMPL itself? Or it won't affect the original SMPL's performance but we need to fix it in TAVA.

liruilong940607 commented 1 year ago

The problematic [R|T] is the reason behind the rigidly attached bones. The result of that is seemingly SMPL has 23 axis-angles to control the pose, there are actually only 19 degree of freedom because some bones are constrained to have the same transformation.

There is no problem of using SMPL as is. But if this issue (maybe it’s a bug in SMPLX because it doesn’t make sense to me) would be fixed, SMPL would have the full 23 degree of freedom so should be more expressive.

facebookresearch / tava

Inplementation of linear blend skinning #14

1. DoF: 24/23 or 19?

2. The problematic transformation