geopavlakos / hamer

HaMeR: Reconstructing Hands in 3D with Transformers
https://geopavlakos.github.io/hamer/
MIT License
326 stars 28 forks source link

How to project pred_keypoints_2d onto the original image? #20

Closed linjiangya closed 6 months ago

linjiangya commented 7 months ago

I think the projected 2D keypoints are in the cropped image coordinate system. How can we transform them to visualize in the original image?

Xeanloo commented 6 months ago

I'm having trouble with this as well. Did you manage to visualize them now?

grafik

geopavlakos commented 6 months ago

Please take a look at this reply (second paragraph): https://github.com/shubham-goel/4D-Humans/issues/55#issuecomment-1773383934

linjiangya commented 6 months ago

Please take a look at this reply (second paragraph): shubham-goel/4D-Humans#55 (comment)

Thank you for your reply. I actually already successfully managed to project it using the function below and I think it's the same as your solution:

pred_2d_joints[:,0] = (2*is_right-1)*pred_2d_joints[:,0]
pred_2d_joints = model_cfg.MODEL.IMAGE_SIZE * (pred_2d_joints + 0.5)
.
.
.

def convert_crop_coords_to_orig_img(bbox, keypoints, crop_size):
    # import IPython; IPython.embed(); exit()
    cx, cy, h = bbox[:, 0], bbox[:, 1], bbox[:, 2]

    # unnormalize to crop coords
    # keypoints = 0.5 * crop_size * (keypoints + 1.0)

    # rescale to orig img crop
    keypoints *= h[..., None, None] / crop_size

    # transform into original image coords
    keypoints[:,:,0] = (cx - h/2)[..., None] + keypoints[:,:,0]
    keypoints[:,:,1] = (cy - h/2)[..., None] + keypoints[:,:,1]
    return keypoints

But I am just wondering if there is a way to directly project it to raw image with pred_cam_t_full. I have tried to project using the pred_cam_t_full to do so but failed in the end. It was the line 432 ~ line 435 below: image

geopavlakos commented 6 months ago

Can you try using the value of scaled_focal_length for the focal length argument in the perspective_projection() function?

linjiangya commented 6 months ago

Can you try using the value of scaled_focal_length for the focal length argument in the perspective_projection() function?

Yes, I have tried this before, but still failed like this: image

In the for loop, I just append all the out['pred_keypoints_2d'] into all_pred_2d after running pred_joints[:,0] = (2*is_right-1)*pred_joints[:,0] to get all_pred_2d: image Finally, in visualization, I just directly draw all_pred_2d on the image and I commented all the other lines such as:

# all_pred_2d = model_cfg.MODEL.IMAGE_SIZE * (all_pred_2d + 1) * 0.5
# all_pred_2d = convert_crop_coords_to_orig_img(bbox=all_bboxes, keypoints=all_pred_2d, crop_size=model_cfg.MODEL.IMAGE_SIZE)

image

So, the only operation I am doing here is the perspective_projection() function and pred_joints[:,0] = (2*is_right-1)*pred_joints[:,0]. However, there is nothing on the image and the coordinate values of all_pred_2d will be completely out of range like this:

print(all_pred_2d) [[[-1.03754102e+03 2.77582581e+02 1.00000000e+00] [-1.16675171e+03 2.40004700e+02 1.00000000e+00] [-1.27199451e+03 2.05619980e+02 1.00000000e+00] [-1.36214331e+03 1.69927322e+02 1.00000000e+00] [-1.47551465e+03 1.43287979e+02 1.00000000e+00] [-1.21423535e+03 1.75111313e+01 1.00000000e+00] [-1.25730664e+03 -8.75083847e+01 1.00000000e+00] [-1.29345227e+03 -1.48755020e+02 1.00000000e+00] [-1.33592932e+03 -2.22113480e+02 1.00000000e+00] [-1.14221765e+03 -2.68055496e+01 1.00000000e+00] [-1.17417883e+03 -1.27042961e+02 1.00000000e+00] [-1.20458093e+03 -1.96846970e+02 1.00000000e+00] [-1.23639014e+03 -2.81423645e+02 1.00000000e+00] [-1.05639343e+03 -2.09048510e-01 1.00000000e+00] [-1.08045679e+03 -9.81334839e+01 1.00000000e+00] [-1.09922742e+03 -1.79204544e+02 1.00000000e+00] [-1.12557812e+03 -2.65227509e+02 1.00000000e+00] [-9.95900696e+02 3.61585121e+01 1.00000000e+00] [-9.83542847e+02 -3.77013702e+01 1.00000000e+00] [-9.77083618e+02 -1.03528061e+02 1.00000000e+00] [-9.77788635e+02 -1.79393829e+02 1.00000000e+00]]]

image

geopavlakos commented 6 months ago

What are the dimensions of the original image for this example (if possible, could you also attach it)?

linjiangya commented 6 months ago

Sure, the shape is (720, 1280, 3)

Here's an example and the script I am using now: demo_2.txt

000000

geopavlakos commented 6 months ago

I couldn't get the exact same numbers as you are (I didn't run your script), but two changes that you will need to do when you call perspective_projection:

With these changes, you should be able to project the 3D joints to the original image.

linjiangya commented 6 months ago

I couldn't get the exact same numbers as you are (I didn't run your script), but two changes that you will need to do when you call perspective_projection:

  • negate the x coordinate for the 3D keypoints of the left hands before passing them to the projection function (so, instead of running the pred_joints[:,0] = (2*is_right-1)*pred_joints[:,0] afterwards as you do now, just do pred_keypoints_3d[:,:,0] = (2*is_right-1)*pred_keypoints_3d[:,:,0] before the projection).
  • perspective_projection takes as input also the camera_center. Since you project on the original image, you can use this input argument. In your case, it would be [1280/2, 720/2], or more generally [W/2, H/2].

With these changes, you should be able to project the 3D joints to the original image.

I am sorry, this image is not what I used to produce the numbers before. But anyway, after making the two changes you mentioned, now it works! Thank you for pointing out the usage of camera_center! :) image

Also, thank you for mentioning the timing to negate the x coordinate counts. I found that if I want to use perspective_projection() to get 2D keypoints on the raw image, I could only follow the order you said to negate the pred_keypoints_3d before projection because it's not correct to flip the hand by negating the x value by (pred_joints[:,0] = (2*is_right-1)*pred_joints[:,0]) as it's already de-normalized from [-0.5, 0.5]. However, if I use my convert_crop_coords_to_orig_img(), I can just negate the x coordinate after the projection (and before feeding into convert_crop_coords_to_orig_img()) like below, because the 2D keypoints are normalized to [-0.5, 0.5] so :

for n in range(batch_size):
    pred_joints[:,0] = (2*is_right-1)*pred_joints[:,0]
    all_pred_2d.append(pred_joints)
.
.
.
all_pred_2d = model_cfg.MODEL.IMAGE_SIZE * (all_pred_2d + 0.5)
all_pred_2d = convert_crop_coords_to_orig_img(bbox=all_bboxes, keypoints=all_pred_2d, crop_size=model_cfg.MODEL.IMAGE_SIZE)

# visualization below
.
.
.

Thank you so much for this! :)