Closed hahamini closed 2 months ago
The parameter global_orient
corresponds to the global orientation of the root of the MANO model.
About the reference, you will need to check the representation of MANO internally.
@geopavlakos Thanks for your answer.
I have a few more questions.
hand_pose
and pred_keypoints_3d
to project a pose to a 2D image.
Here, the right hand is projected well, but the left hand
is projected to the wrong location. I don't know why.hand_pose
, a rotation matrix with 15 joints, and keypoints_3d
, a matrix with 21 keypoints?Here is my demo code.
for n in range(batch_size):
global_orient = (
out["pred_mano_params"]["global_orient"].detach().cpu().numpy()[n]
)
hand_pose = (
out["pred_mano_params"]["hand_pose"].detach().cpu().numpy()[n]
)
keypoints_3d = out["pred_keypoints_3d"][n].detach().cpu().numpy() + (
pred_cam_t_full[n]
)
# Hand root
cv2.drawFrameAxes(
canvas,
intrinsic.camera_matrix,
np.zeros(5),
cv2.Rodrigues(np.squeeze(global_orient))[0],
keypoints_3d[0],
0.01,
)
# Thumb pip
cv2.drawFrameAxes(
canvas,
intrinsic.camera_matrix,
np.zeros(5),
cv2.Rodrigues(hand_pose[0])[0],
keypoints_3d[2],
0.01,
)
# Thumb tip
cv2.drawFrameAxes(
canvas,
intrinsic.camera_matrix,
np.zeros(5),
cv2.Rodrigues(hand_pose[2])[0],
keypoints_3d[4],
0.01,
)
1) The hand pose parameters follow the regular MANO order, so I would point you to that. Please check Lines 206-220 here. The 3D keypoints follow the OpenPose order.
2) I believe you will need to flip the 3D keypoints (and pred_cam_t_full
) of the left hand across the x axis (multiply the first dimension with -1). Please check this issue for an explanation.
3) Yes, you are correct, the rotations are expressed relative to the parent node. For this, we follow the MANO convention, so for more details you can check how this is implemented here.
@geopavlakos This was really helpful, thank you.
@geopavlakos
Please explain more about 2.
I have pred_keypoints_3d
and hand_pose
of shape (15, 3) for my left hand, and pred_cam_t_full
of shape (,3).
Actually, I don't know whether the pred_keypoints_3d
is based on the camera coordinate system or the MANO model.
However, since you said that it is flipped based on the x-axis
, I did the x-axis flip transformation, but the result is wrong.
I would appreciate it if you could explain specifically whether I should try to add pred_cam_t_full
to pred_keypoints_3d
and then try the transformation, or if I should do it before that.
Or, if you could provide a pseudocode that calculates the left hand from the 6dof pose of the right hand in the camera coordinate system, I would appreciate it.
for n in range(batch_size):
global_orient = (
out["pred_mano_params"]["global_orient"].detach().cpu().numpy()[n]
)
hand_pose = (
out["pred_mano_params"]["hand_pose"].detach().cpu().numpy()[n]
)
keypoints_3d = out["pred_keypoints_3d"].detach().cpu().numpy()[n]
if right[n] == 0:
flip_matrix = np.array([[1, 0, 0], [0, -1, 0], [0, 0, -1]])
keypoints_3d = np.matmul(
flip_matrix, keypoints_3d.reshape(-1, 3, 1)
).reshape(-1, 3)
keypoints_3d += pred_cam_t_full[n]
You should multiply the first dimension with -1, not the second and the third.
Similarly, you need to multiply the first dimension of pred_cam_t_full
with -1 before you add them together.
@geopavlakos As you said, I multiplied the first dimension (pred_keypoints_3d[:,0]) of pred_keypoints_3d(21,3) by -1, and also multiplied the first dimension (pred_cam_t_full[0]) of pred_cam_t_full(3,) by -1. However, when I verified it with re-projection, the result was wrong. (Of course, the right hand result is perfect.)
global_orient = (
out["pred_mano_params"]["global_orient"].detach().cpu().numpy()[n]
)
hand_pose = (
out["pred_mano_params"]["hand_pose"].detach().cpu().numpy()[n]
)
keypoints_3d = out["pred_keypoints_3d"].detach().cpu().numpy()[n]
# keypoints_3d -> (21,3)
# pred_cam_t_full[n] -> (3,)
if right[n] == 0:
keypoints_3d[:, 0] *= -1
pred_cam_t_full[n][0] *= -1
keypoints_3d += pred_cam_t_full[n]
My bad, the multiplication for the camera translation has already been taken care of in line 138 of the demo. You only need to multiply the first dimension of the keypoints with -1 and that should be it.
@geopavlakos Does that mean I don't have to change anything in my code to handle the right hand?
I just need to add pred_cam_t_full to get the 3D keypoints for the right hand? But still the result was wrong.
for batch in dataloader:
batch = recursive_to(batch, device)
with torch.no_grad():
out = model(batch)
multiplier = 2 * batch["right"] - 1
pred_cam = out["pred_cam"]
pred_cam[:, 1] = multiplier * pred_cam[:, 1]
box_center = batch["box_center"].float()
box_size = batch["box_size"].float()
img_size = batch["img_size"].float()
pred_cam_t_full = (
cam_crop_to_full(
pred_cam,
box_center,
box_size,
img_size,
my_intrinsic
)
.detach()
.cpu()
.numpy()
)
batch_size = batch["img"].shape[0]
for n in range(batch_size):
global_orient = (
out["pred_mano_params"]["global_orient"].detach().cpu().numpy()[n]
)
hand_pose = (
out["pred_mano_params"]["hand_pose"].detach().cpu().numpy()[n]
)
keypoints_3d = out["pred_keypoints_3d"].detach().cpu().numpy()[n]
# keypoints_3d -> (21,3)
# pred_cam_t_full[n] -> (3,)
#if right[n] == 0:
# keypoints_3d[:, 0] *= -1
# pred_cam_t_full[n][0] *= -1
keypoints_3d += pred_cam_t_full[n]
++ demo.py(138) And you said to multiply the x-axis by -1 because the left and right hands are mirrors of the x-axis. However, when I checked line 138, I found that it multiplies the 1st index (y-axis) of pred_cam. I wonder if this is correct.
I mentioned above that you "only need to multiply the first dimension of the keypoints with -1". By commenting out the whole if-statement in your code, you don't multiply the keypoint of the left hand with -1. The correct code would be:
if right[n] == 0:
keypoints_3d[:, 0] *= -1
keypoints_3d += pred_cam_t_full[n]
I'm asking this question because I want to calculate the vector of the palm from the wrist(root).
I guess it's the orientation of the root keypoint. Is that right? If so, what is the reference of that coordinate system? (For example, palm orientation is z-axis)
Thank you.