Can you explain more about smpl transform?

MoyGcc / vid2avatar

Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition (CVPR2023)

Other

1.23k stars 100 forks source link

Thank you for your amazing jobs!

I have a few questions from the preprocessing codes.

Can you explain more about below function?
Is there a reference world coordinate system?
Is this code translating the camera to a coordinate system based on smpl?
Why do you add hip?
Why do you do inverse?
What is "we assume" reference?

# transform SMPL such that the target camera extrinsic will be met
def transform_smpl(curr_extrinsic, target_extrinsic, smpl_pose, smpl_trans, T_hip):
    R_root = cv2.Rodrigues(smpl_pose[:3])[0]
    transf_global_ori = (
        np.linalg.inv(target_extrinsic[:3, :3]) @ curr_extrinsic[:3, :3] @ R_root
    )

    target_extrinsic[:3, -1] = (
        curr_extrinsic[:3, :3] @ (smpl_trans + T_hip)
        + curr_extrinsic[:3, -1]
        - smpl_trans
        - target_extrinsic[:3, :3] @ T_hip
    )

    smpl_pose[:3] = cv2.Rodrigues(transf_global_ori)[0].reshape(3)
    smpl_trans = np.linalg.inv(target_extrinsic[:3, :3]) @ smpl_trans  # we assume

    return target_extrinsic, smpl_pose, smpl_trans

Thank you once again.

Hi, thank you for your interest. This function mainly transforms the SMPL model properly so that we can still project the SMPL mesh into image space using the target camera extrinsic. And our target camera has the same orientation as the OpenGL camera. So it simply converts the SMPL model to an OpenGL coordinate system so that the normal rendering (color coding) is compatible with off-the-shelf human normal predictors like PIFuHD/ICON etc.

The hip position here plays a role to translate SMPL model to be rooted in the origin since by default, the SMPL model has a translation (hip to origin) to the origin in the SMPL space.

The comment for "we assume" is not complete (sorry for that). It's actually we assume R'_cam @ T'_smpl = T_smpl where ' denotes the values in the target coordinate system. We have the freedom when solving T'_smpl and T'_cam and this assumption just makes the calculation easier.

But of course, this step can be simply removed without hurting the final results. Just a different color coding for the rendered normal maps.

MoyGcc / vid2avatar

Can you explain more about smpl transform? #41