google-research-datasets / Objectron

Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box describes the object’s position, orientation, and dimensions. The dataset contains about 15K annotated video clips and 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes
Other
2.24k stars 263 forks source link

The method of obtaining the depth of point cloud #80

Open Lizhuoling opened 1 year ago

Lizhuoling commented 1 year ago

Hi, Thanks for your great work. In the code, the point cloud in 3D space is transformed into the camera pixel plane with view_matrix and projection_matrix, which are often used in graphics. But I am not familiar with them and struggle with this process. What I often use to transform coordinates are camera intrinsics and extrinsics. Luckily, a demo of using view_matrix and projection_matrix is presented. In this demo, 2D pixel coordinates are obtained based on 3D points, view_matrix and projection_matrix. The demo is as follows:

def project_points(points, projection_matrix, view_matrix, width, height):
    p_3d = np.concatenate((points, np.ones_like(points[:, :1])), axis=-1).T
    p_3d_cam = np.matmul(view_matrix, p_3d)
    p_2d_proj = np.matmul(projection_matrix, p_3d_cam)
    # Project the points
    p_2d_ndc = p_2d_proj[:-1, :] / p_2d_proj[-1, :]
    p_2d_ndc = p_2d_ndc.T

    # Convert the 2D Projected points from the normalized device coordinates to pixel values
    x = p_2d_ndc[:, 1]
    y = p_2d_ndc[:, 0]
    pixels = np.copy(p_2d_ndc)
    pixels[:, 0] = ((1 + x) * 0.5) * width
    pixels[:, 1] = ((1 + y) * 0.5) * height    
    pixels = pixels.astype(int)
    return pixels

Now, I want to get the depth (absolute distance, not normalized value) of the point cloud. I am wondering whether the desired depth is the p_2d_proj[-1, :] in the demo? Many thanks.

ahmadyan commented 1 year ago

Using View*Projection is mathematically equivalent to K@P. You can take a look at this article to become familiar with it: http://www.songho.ca/opengl/gl_projectionmatrix.html

There should be examples for both apporach in the repo, for example PTAL: notebooks/objectron-3dprojection-hub-tutorial.ipynb