Shimingyi / MotioNet

A deep neural network that directly reconstructs the motion of a 3D human skeleton from monocular video [ToG 2020]
https://rubbly.cn/publications/motioNet/
BSD 2-Clause "Simplified" License
554 stars 82 forks source link

About foot-contact loss #28

Open JinchengWang opened 3 years ago

JinchengWang commented 3 years ago

Looking at trainer.py, it seems that loss_fc is measured based on joint speed in local frame (w/ root at the origin). Shouldn't it be measured based on joint speed in a fixed frame instead?

For example, one can shift their center of mass from right foot to left foot, creating foot joint movement relative to root joint, without actually lifting their feet.

Shimingyi commented 3 years ago

Let's check the code:

loss_fc = (torch.mean(get_velocity(fake_pose_3d, 3)[contacts[:, 1:-1, 0] == 1] ** 2) + torch.mean(get_velocity(fake_pose_3d, 6)[contacts[:, 1:-1, 0] == 1] ** 2))

def get_velocity(motions, joint_index):
    joint_motion = motions[..., [joint_index*3, joint_index*3 + 1, joint_index*3 + 2]]
    velocity = torch.sqrt(torch.sum((joint_motion[:, 2:] - joint_motion[:, :-2])**2, dim=-1))
    return velocity

The here, the pose_3d is a complete skeleton rather than a relative joint offset. So we can use joint_motion[:, 2:] - joint_motion[:, :-2] to get the moving distance within 3 frames.

JinchengWang commented 3 years ago

I was looking at fk_model.forward_fk(). It seems that fake_pose_3d is computed from skeleton and fake_rotations_full, and does not make use of the absolute position of the root joint in the world. Thus if I understand correctly, fake_pose_3d must be the 3d joint locations in the local frame (in other words positions relative to the root joint) instead of the fixed camera frame?

Shimingyi commented 3 years ago

Ok, I understand your question now. You are right. The predicted 3d pose is in a camera space, we didn't conver it to world coordinate. So it will cause a little ambiguities when you do some movements like squating: The foot is contacted but the realaive foot position is not fixed, if we consist the velocity to zero in this case, it will reduce the amplitude of the motion. In general it have effectiveness but the motivation will be different. I will consider how to fix this bug in next big commiting.