facebookresearch / Ego4d

Ego4d dataset repository. Download the dataset, visualize, extract features & example usage of the dataset
https://ego4d-data.org/docs/
MIT License
348 stars 47 forks source link

Body pose looks misaligned in EgoExo4D #331

Open isarandi opened 5 months ago

isarandi commented 5 months ago

It seems that the human poses are not correct. I'm simply plotting the 2D annotations as released. This is take uid eba56e4f-7ec8-4d47-9380-e69928323e94 (iiith_cooking_111_2)

image

I've tried a 2-3 other takes, too, and none seem correct.

Here is unc_basketball_03-31-23_02_14 cef2f19f-ec48-410c-8205-4572cc8706d9 frame 39 image

This visualization is directly of the 2D coordinates, I'm not projecting the 3D coordinates to the image by myself here (although that also results in misaligned projections).

Is this known?

Furthermore, when I estimate 2D poses on my own, per camera, and try to triangulate it myself using the given camera calibration parameters, it doesn't really work (the reprojection error is high), which leads me to suspect that the camera calibration may have issues.

miguelmartin75 commented 5 months ago

The 2d key points are in undistorted image space relative the camera intrinsics released with Egopose. cc @suyogduttjain

isarandi commented 5 months ago

That's helpful. I've tried using OpenCV's 4-parameter fisheye camera model for interpreting the distortion coefficients, but the results don't seem aligned. What camera model should we use?

Edit: based on https://github.com/facebookresearch/Ego4d/blob/6056f8deac0cea8d8d2caad2f55995506941156c/ego4d/internal/human_pose/undistort_to_halo.py it does look like opencv fisheye.

Might it be that the released intrinsic matrix is the "new_K" after undistortion? That may explain why projecting the 3D annotation accroding to the camera parameters (including distortion coeffs) gives wrong alignment.

More broadly, it would be helpful to have an example of visualizing the pose.

Edit 2: For the basketball sequence, I get correct alignment after dividing the focal length by 0.7. However, the factor seems to be different for each video.

suyogduttjain commented 5 months ago

Hi,

This is our challenge/baseline repository: https://github.com/EGO4D/ego-exo4d-egopose

It contains a lot of useful information on data loading, preparation and how to work with this data.

suyogduttjain commented 5 months ago

Specifically this: https://github.com/EGO4D/ego-exo4d-egopose/tree/main/handpose/data_preparation

isarandi commented 5 months ago

That is unfortunately only about the Aria data, not Exo data.

I still suspect the released intrinsics are not correct for Exo. It seems that the released intrinsic matrix in the annotations/ego_pose/train/camera_pose/*.json is for the undistorted image. During undistortion, the intrinsic matrix also changes, it is not merely about setting the distortion params to zero. The original intrinsic matrix for exo cameras is apparently not available. For Aria, the original calibration is available in the VRS file, but Exo data has no VRS.

I get exact alignment between the released 2D coords and when I project the released 3D coords with the released intrinsics and extrinsics while ignoring the distortion coeffs. Probably therefore the released intrinsics are the new_K, as output by cv2.fisheye.estimateNewCameraMatrixForUndistortRectify here: https://github.com/facebookresearch/Ego4d/blob/6056f8deac0cea8d8d2caad2f55995506941156c/ego4d/internal/human_pose/undistort_to_halo.py#L289

There's no simple formula to invert estimateNewCameraMatrixForUndistortRectify. For now, a hacky way to get it is to optimize/search the original focal length such that cv2.fisheye.estimateNewCameraMatrixForUndistortRectify outputs the released intrinsics.

import cv2
import scipy.optimize

def get_orig_intrinsic_matrix(released_intrinsic_matrix, distortion_coeffs):
    size = (int(released_intrinsic_matrix[0,2]*2), int(released_intrinsic_matrix[1,2]*2))
    orig_intr = released_intrinsic_matrix.copy()

    def objective(focal):
        orig_intr[0,0] = focal
        orig_intr[1,1] = focal
        new_K = cv2.fisheye.estimateNewCameraMatrixForUndistortRectify(orig_intr, distortion_coeffs, size, np.eye(3), balance=0.8)
        return (new_K[0,0] - released_intrinsic_matrix[0,0])**2

    optimal_focal = scipy.optimize.minimize_scalar(objective, bounds=(100, 5000), method='bounded', options=dict(xatol=1e-4)).x
    orig_intr[0,0] = optimal_focal
    orig_intr[1,1] = optimal_focal
    return orig_intr

Projecting with the resulting parameters gives the correct alignment:

image

In summary, the original intrinsic matrices for exo cameras would be useful to have as well.

miguelmartin75 commented 3 months ago

Exo intrinsics have been released in the captures folder since the start.

capture_traj_dir = os.path.join(RELEASE_DIR, take["capture"]["root_dir"], "trajectory")
assert os.path.exists(capture_traj_dir)

gopro_calibs_df = pd.read_csv(os.path.join(capture_traj_dir, "gopro_calibs.csv"))
calib_df = gopro_calibs_df[gopro_calibs_df.cam_uid == cam_id]

D, I = get_distortion_and_intrinsics(calib_df.iloc[0].to_dict())

The intrinsics did change over time due to updates to the MPS algorithm during development. So the intrinsics used for annotations may be different. I did a basic check and for the GoPro intrinsics it appears to be the same. I will let @suyogduttjain comment on this as he lead our annotation effort for body/hand pose & pushed these benchmarks to the finish line for the release.

For the above referenced function:

def undistort_exocam(image, intrinsics, distortion_coeffs, dimension = (3840, 2160)):
    DIM=dimension
    dim2=None
    dim3=None
    balance=0.8
    # Load the distortion parameters
    distortion_coeffs = distortion_coeffs
    # Load the camera intrinsic parameters
    intrinsics = intrinsics

    dim1 = image.shape[:2][::-1]  #dim1 is the dimension of input image to un-distort

    # Change the calibration dim dynamically (bouldering cam01 and cam04 are verticall for examples)
    if DIM[0] != dim1[0]:
        DIM = (DIM[1], DIM[0])

    assert dim1[0]/dim1[1] == DIM[0]/DIM[1], "Image to undistort needs to have same aspect ratio as the ones used in calibration"
    if not dim2:
        dim2 = dim1
    if not dim3:
        dim3 = dim1
    scaled_K = intrinsics * dim1[0] / DIM[0]  # The values of K is to scale with image dimension.
    scaled_K[2][2] = 1.0  # Except that K[2][2] is always 1.0

    # This is how scaled_K, dim2 and balance are used to determine the final K used to un-distort image. OpenCV document failed to make this clear!
    new_K = cv2.fisheye.estimateNewCameraMatrixForUndistortRectify(scaled_K, distortion_coeffs, dim2, np.eye(3), balance=balance)
    map1, map2 = cv2.fisheye.initUndistortRectifyMap(scaled_K, distortion_coeffs, np.eye(3), new_K, dim3, cv2.CV_16SC2)
    undistorted_image = cv2.remap(image, map1, map2, interpolation=cv2.INTER_LINEAR, borderMode=cv2.BORDER_CONSTANT)

    return undistorted_image, new_K

def get_distortion_and_intrinsics(_raw_camera):
    intrinsics = np.array(
        [
            [_raw_camera['intrinsics_0'], 0, _raw_camera['intrinsics_2']],
            [0, _raw_camera['intrinsics_1'], _raw_camera['intrinsics_3']],
            [0,0,1],
        ]
    )
    distortion_coeffs = np.array(
        [
             _raw_camera['intrinsics_4'], _raw_camera['intrinsics_5'], _raw_camera['intrinsics_6'], _raw_camera['intrinsics_7']
        ]
    )
    return distortion_coeffs, intrinsics
chaitanya100100 commented 3 months ago

@miguelmartin75 @suyogduttjain , I use the functions you provided above (undistort_exocam, get_distortion_and_intrinsics, etc) to get undistorted image as well as corresponding intrinsics. I use it to project provided 3d body pose and compare it with provided 2d body pose. Although it aligns much better than before, there is still some misalignment. See below where red points show groundtruth 2d pose and green points show projected 3d pose.

image

However, according to this function, the alignment between provided 2d pose and projected 3d pose should be exact. What could be wrong?

One possibility is that T_device_world in that script doesn't seem to match the extrinsics from gopro_calibs.csv file. Does it have to do with the difference between T_device_world and T_camera_world for exo cameras?

As @isarandi mentioned before, it would be great if you can provide a working example of projecting 3d pose and ensuring alignment with 2d pose.

suyogduttjain commented 3 months ago

Hi,

We have created a notebook tutorial to show how to un-distort and overlay annotations on them. Link: https://github.com/facebookresearch/Ego4d/blob/main/notebooks/egoexo/Ego-Exo4D_EgoPose_Tutorial.ipynb

Regarding matching projected 3D and 2d pose, are you looking to match projected ones with human annotated ground truth 2D pose? Or this is specifically asking about automatic ground truth?

chaitanya100100 commented 3 months ago

Thanks @suyogduttjain for the notebook reference. Yes, I am looking to match the projected 3d pose with annotated 2d pose.

Does it mean that the automatic 2d pose groundtruth will match the projected 3d pose but the human annotated 2d pose ground truth may not match the projection because of inconsistencies and triangulation error?

suyogduttjain commented 2 months ago

For automatic pose ground truth, the 2D points were generated first and then 3D triangulation was done based on those 2D points using camera parameters. Over the course of dataset building the camera parameters changed several times due to improved localization algorithms hence we updated 3D poses whenever that happened but the 2D points remain the same. Hence the 2D points should not be treated as projections.

Same holds for manual ground truth except in that case 2D points were further corrected by humans during the annotation process.

Hope this helps.

chaitanya100100 commented 2 months ago

@suyogduttjain, thanks for providing these details. I understand that I should not expect 2d pose groundtruth (automatic or manual) to perfectly match the 3d pose groundtruth. This really helps!

muelea commented 2 months ago

Thanks for your great work and effort! The notebook is very helpful to get correctly aligned keypoints in undistorted image space. But I got stuck when trying to do the inverse, i.e. take the annotated 2d joints (undistorted image space) and map them back to the original image (distorted image space). Inverting the maps together with remap works perfectly fine for an image(input values to initInverseRectificationMap() are taken from undistort_exocam() in the sample notebook):

  # Get inverse mappings
  inverse_map1, inverse_map2 = cv2.initInverseRectificationMap(
      scaled_K, D, np.eye(3), new_K_latest, 
      np.array(img).shape[:2][::-1], cv2.CV_16SC2
  )
  # Use these inverse maps to remap the rectified image
  original_img = cv2.remap(
      np.array(undistorted_frame), inverse_map1, inverse_map2, 
      interpolation=cv2.INTER_LINEAR, borderMode=cv2.BORDER_CONSTANT
  )

Did somebody experience the same problem? I would appreciate your help.