facebookresearch / InterHand2.6M

Official PyTorch implementation of "InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image", ECCV 2020
Other
676 stars 92 forks source link

Issue using transform.py function to convert world coordinates to pixel coordinates #24

Closed messmor closed 3 years ago

messmor commented 3 years ago

Hello,

I am using the utility functions world2camera and camera2pixel to compute the image plane coordinates for the joints locations in the 2D images hand images. I load in the camera parameters R,t just as you do in the render.py code. However when I piece everything together, the joint locations do not match with those in the images (see example below). Any help is appreciated. How are the R and t quantities used to convert from world to camera coordinates?

'def cam2pixel(cam_coord, f, c): x = cam_coord[:, 0] / (cam_coord[:, 2] + 1e-8) f[0] + c[0] y = cam_coord[:, 1] / (cam_coord[:, 2] + 1e-8) f[1] + c[1] z = cam_coord[:, 2] img_coord = np.concatenate((x[:,None], y[:,None], z[:,None]),1) return img_coord

def world2cam(world_coord, R, T): cam_coord = np.dot(world_coord-T,R) cam_coord.transpose() return cam_coord

def world2image(joints,cam_params, capture_id, frame_idx, cam, hand_type):

camera extrinsic parameters (t is the translation vector, R is the rotation matrix)

t, R = np.array(cam_params[str(capture_id)]['campos'][str(cam)], dtype=np.float32).reshape(3), np.array(cam_params[str(capture_id)]['camrot'][str(cam)], dtype=np.float32).reshape(3,3)
t = -np.dot(R,t.reshape(3,1)).reshape(3) # -Rt -> t
focal=cam_params[str(capture_id)]['focal'][str(cam)]
princpt=cam_params[str(capture_id)]['princpt'][str(cam)]

# Transform to camera coordinates
cam_coord=world2cam( joints[str(capture_id)][str(frame_idx)]['world_coord'],R,t)

#Transform to pixel/image coordinates
image_coord=cam2pixel(cam_coord,focal,princpt)

#Split into 2 subarray. One for right hand. One for left hand.
image_coord_right=image_coord[np.arange(0,21),:]
image_coord_left=image_coord[np.arange(21,21*2),:]

#Fill in zeros if one hand does not appear in frame
if hand_type == 'right':
    image_coord_left=np.zeros(np.shape(image_coord_left))
elif hand_type == 'left':
  image_coord_right=np.zeros(np.shape(image_coord_right))

return [image_coord_right,image_coord_left]

' PerspectiveProjectionTest

mks0601 commented 3 years ago

In render.py, I calculated camera-centered coordinates by cam = R@world + t, where t=-RT. On the other hand, original world2cam calculates camera-centered coordinates by cam=(world-T)@R. In your case, you are mixing the two functions. If you want to use world2cam function, you should remove t = -np.dot(R,t.reshape(3,1)).reshape(3) # -Rt -> t line.

messmor commented 3 years ago

Thank you! This worked.