MoyGcc / vid2avatar

Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition (CVPR2023)
https://moygcc.github.io/vid2avatar/
Other
1.23k stars 100 forks source link

Can you explain more about smpl transform? #41

Closed soobinseo closed 1 year ago

soobinseo commented 1 year ago

Thank you for your amazing jobs!

I have a few questions from the preprocessing codes.

# transform SMPL such that the target camera extrinsic will be met
def transform_smpl(curr_extrinsic, target_extrinsic, smpl_pose, smpl_trans, T_hip):
    R_root = cv2.Rodrigues(smpl_pose[:3])[0]
    transf_global_ori = (
        np.linalg.inv(target_extrinsic[:3, :3]) @ curr_extrinsic[:3, :3] @ R_root
    )

    target_extrinsic[:3, -1] = (
        curr_extrinsic[:3, :3] @ (smpl_trans + T_hip)
        + curr_extrinsic[:3, -1]
        - smpl_trans
        - target_extrinsic[:3, :3] @ T_hip
    )

    smpl_pose[:3] = cv2.Rodrigues(transf_global_ori)[0].reshape(3)
    smpl_trans = np.linalg.inv(target_extrinsic[:3, :3]) @ smpl_trans  # we assume

    return target_extrinsic, smpl_pose, smpl_trans

Thank you once again.

MoyGcc commented 1 year ago

Hi, thank you for your interest. This function mainly transforms the SMPL model properly so that we can still project the SMPL mesh into image space using the target camera extrinsic. And our target camera has the same orientation as the OpenGL camera. So it simply converts the SMPL model to an OpenGL coordinate system so that the normal rendering (color coding) is compatible with off-the-shelf human normal predictors like PIFuHD/ICON etc.

The hip position here plays a role to translate SMPL model to be rooted in the origin since by default, the SMPL model has a translation (hip to origin) to the origin in the SMPL space.

The comment for "we assume" is not complete (sorry for that). It's actually we assume R'_cam @ T'_smpl = T_smpl where ' denotes the values in the target coordinate system. We have the freedom when solving T'_smpl and T'_cam and this assumption just makes the calculation easier.

But of course, this step can be simply removed without hurting the final results. Just a different color coding for the rendered normal maps.