facebookresearch / projectaria_tools

projectaria_tools is an C++/Python open-source toolkit to interact with Project Aria data
https://facebookresearch.github.io/projectaria_tools/docs/intro
Apache License 2.0
493 stars 65 forks source link

ADT - How can I align pointcloud in scene coordinate? #27

Closed choiszt closed 1 year ago

choiszt commented 1 year ago

Hi team! Currently I'm reconstructing the 3D pointcloud scene based on your brilliant guidance in adt_quickstart_tutorial.ipynb. However, it comes to some issues about converting matrices when I was implementing the pointcloud conversion.

Firstly, I supposed that the depth data are in the camera coordinate based on gt_provider.get_depth_image_by_timestamp_ns(timestamp, stream_id) and then I acquire the intrinsic data from

calibration_provider=gt_provider.get_aria_camera_calibration(stream_id)
fx,fy=calibration_provider.get_focal_lengths()
cx,cy=calibration_provider.get_principal_point()

in order to convert rgbd data to pointcloud as follows:

def rgbd_to_ply(rgb_path, depth_path, ply_path,fx,fy,cx,cy):
    rgb = cv2.imread(rgb_path)
    depth = cv2.imread(depth_path, cv2.IMREAD_UNCHANGED)

    if rgb is None or depth is None:
        raise ValueError("Load Error")

    h, w= depth.shape

    fx = fx 
    fy = fy
    cx = cx
    cy = cy

    vertices = []
    for y in tqdm(range(h)):
        for x in range(w):
            Z = depth[y, x] / 1000.0 
            X = (x - cx) * Z / fx
            Y = (y - cy) * Z / fy
            color = rgb[y, x]
            vertices.append((X, Y, Z, color[2], color[1], color[0]))  # BGR->RGB

    vertex = np.array(vertices, dtype=[('x', 'f4'), ('y', 'f4'),('z', 'f4'), ('red', 'u1'),('green', 'u1'), ('blue', 'u1')])
    el = PlyElement.describe(vertex, 'vertex')
    PlyData([el]).write(ply_path)

The next step I want to align pointclouds in each frame to the same world coorindate, so I was trying to acquire the extrinsic matrices:

cam2dev=gt_provider.get_aria_transform_device_camera(stream_id).inverse().to_matrix() #camera 2 device
dev2sce=gt_provider.get_aria_3d_pose_by_timestamp_ns(STAMP).data().transform_scene_device.inverse().to_matrix()#device 2 scene

and I supposed transform_matrix=cam2dev@dev2sceis the exactly extrinsic matrix.

Finally, I implement the alignment in world coordinate by `def transform_ply_with_matrix(ply_path, transform_matrix, output_path): ply_data = PlyData.read(ply_path) vertex_data = np.vstack([ply_data['vertex']['x'], ply_data['vertex']['y'], ply_data['vertex']['z']]).T

homogenous_coords = np.hstack([vertex_data, np.ones((vertex_data.shape[0], 1))])

transformed_data = (transform_matrix @ homogenous_coords.T).T

ply_data['vertex']['x'] = transformed_data[:, 0]
ply_data['vertex']['y'] = transformed_data[:, 1]
ply_data['vertex']['z'] = transformed_data[:, 2]

ply_data.write(output_path)`

but the mismatch encountered when I tried to visualize two ply in Meshlab.

nickcharron commented 1 year ago

Hi @choiszt,

I think you may be misinterpretting our transformation notation here:

cam2dev=gt_provider.get_aria_transform_device_camera(stream_id).inverse().to_matrix() 
dev2sce=gt_provider.get_aria_3d_pose_by_timestamp_ns(STAMP).data().transform_scene_device.inverse().to_matrix()

Please see our documentation on frame convention here

tl;dr:

gt_provider.get_aria_transform_device_camera(stream_id) returns the transformation from camera to device. I believe you have the inverse in your example code.

choiszt commented 1 year ago

Hello @nickcharron, really appreciate for your response! So I guess the correct transformations are:

T_Dev_Cam=gt_provider.get_aria_transform_device_camera(stream_id).to_matrix()  #T_Device_Camera
T_Sce_Dev=gt_provider.get_aria_3d_pose_by_timestamp_ns(STAMP).data().transform_scene_device.to_matrix() #T_scene_device
T_Sce_Cam=T_Sce_Dev@T_Dev_Cam #transform from Cam to Sce
pcd_in_Scene  = (T_Sce_Cam@ homogenous_coords.T).T

But still have mismatch in Meshlab image

nickcharron commented 1 year ago

Those transform definitions look correct to me. I suspect there's still a transform error somewhere in your code.

I also noticed it looks like you're creating your own unprojection assuming a pinhole camera model, which won't be very accurate unless you undistort the images first. I recommend using the unproject function we have in our calibration API.

Let me know if this fixes it. In the meantime, I will try to make an example for generating pointclouds. Hopefully this will help.

choiszt commented 1 year ago

Hi @nickcharron ! I'm not sure if I've made myself clear. The unproject function seems to project an x,y tensor(with the shape [2,1]) to the 3D coordinate. I suppose projecting each pixel in images to the 3D coordinate as a time-consuming operation. I do have some inspiration from the "undistort" concept, and I implement as follows:

def undistort(image):
    sensor_name = gt_provider.raw_data_provider_ptr().get_label_from_stream_id(stream_id)
    device_calib = gt_provider.raw_data_provider_ptr().get_device_calibration()
    src_calib = device_calib.get_camera_calib(sensor_name)
    dst_calib = calibration.get_linear_camera_calibration(1408, 1408, calibration_provider.get_focal_lengths()[0], sensor_name)
    # rectify image
    rectified_image = calibration.distort_by_calibration(image, dst_calib, src_calib)
    return rectified_image

then I use the undistort RGB and depth data to generate pointcloud.

def rgbd_to_ply(rgb_path, depth_path, ply_path,fx,fy,cx,cy):
    rgb = undistort(cv2.imread(rgb_path))
    depth = undistort(cv2.imread(depth_path, cv2.IMREAD_UNCHANGED))  
    h, w= depth.shape
    fx = fx 
    fy = fy
    cx = cx
    cy = cy
    vertices = []
    for y in tqdm(range(h)):
        for x in range(w):
            Z = depth[y, x] / 1000.0 
            X = (x - cx) * Z / fx
            Y = (y - cy) * Z / fy
            color = rgb[y, x]
            vertices.append((X, Y, Z, color[2], color[1], color[0])) 

    vertex = np.array(vertices, dtype=[('x', 'f4'), ('y', 'f4'),
                                       ('z', 'f4'), ('red', 'u1'),
                                       ('green', 'u1'), ('blue', 'u1')])

    el = PlyElement.describe(vertex, 'vertex')
    PlyData([el]).write(ply_path)

And I suppose the intrinsic matrix are compatible to the undistort RGBD. But unfortunately, the mismatching still occur. BTW the point of pinhole radius occurred, I guess distort_by_calibration may have something related to unproject function image

chpeng-fb commented 1 year ago

Hi @choiszt, thank you for asking. What are your intrinsics specifically? Since you undistorted the image to a pinhole camera model with image width = 1408, height = 1408 and focal = calibration_provider.get_focal_lengths()[0]. Your fx, fy should equal calibration_provider.get_focal_lengths()[0] and cx,cy = (1408 - 1) / 2.

The function distort_by_calibration just unprojects all the pixels to ray and reproject back with the dst_calib, which in your case is the pinhole camera model. This function is speeded up in the c++ implementation but you can also try the unproject function in the src_calib without undistortion to debug where the issue is.

Plus, if you could share your complete code, it might be easier to check where could be the error.

nickcharron commented 1 year ago

@choiszt I also created an example notebook that you can follow.

One thing to make sure (a mistake I made myself when creating this example) is not to use the get_visualizable() function on the depth images, as that changes the units.

Hope this helps!

choiszt commented 1 year ago

@choiszt I also created an example notebook that you can follow.

One thing to make sure (a mistake I made myself when creating this example) is not to use the get_visualizable() function on the depth images, as that changes the units.

Hope this helps!

Hi @nickcharron ! I've also noticed the get_visualizable() issue when I tried to convert and save data, I'll check out your examples and really appreciate your effortless help!

nickcharron commented 1 year ago

Closing this issue, but please feel free to re-open if you have any more questions. Thanks!

VitorGuizilini-TRI commented 8 months ago

Hi, quick related question here. From what I understand, undistorting a depth map with distort_by_calibration will only modify the location of depth pixels, not the depth values themselves. That's why after undistortion we still cannot produce a pointcloud from the rectified depth map using pinhole intrinsics. Is there a way to return the updated depth values during the undistortion process? Or produce those some other way?

Thank you very much!

nickcharron commented 8 months ago

Hi @VitorGuizilini-TRI,

Undistorting a depth image is similar to undistoring an RGB image. Using the undistorted image with the pinhole model will result in back projecting the same pixel values to the same physical 3d coordinate as doing the same with the original image + original calibration. So yes, you should be able to produce a pointcloud using the undistorted image with a pinhole model.

The undistortion process does require some slight editing of the pixel values because it needs to fill empty pixels. For those pixels, we provide the option to use a bilinear interpolation or pick the nearest neighbor for getting the depth of that pixel. This will introduce some error. The only way to completely remove that error would be for us to re-render the scene and create new depth maps using a pinhole camera model which would be a significant amount of work.