google-research-datasets / Objectron

Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box describes the object’s position, orientation, and dimensions. The dataset contains about 15K annotated video clips and 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes
Other
2.22k stars 264 forks source link

The method of obtaining the depth of point cloud #80

Open Lizhuoling opened 10 months ago

Lizhuoling commented 10 months ago

Hi, Thanks for your great work. In the code, the point cloud in 3D space is transformed into the camera pixel plane with view_matrix and projection_matrix, which are often used in graphics. But I am not familiar with them and struggle with this process. What I often use to transform coordinates are camera intrinsics and extrinsics. Luckily, a demo of using view_matrix and projection_matrix is presented. In this demo, 2D pixel coordinates are obtained based on 3D points, view_matrix and projection_matrix. The demo is as follows:

def project_points(points, projection_matrix, view_matrix, width, height):
    p_3d = np.concatenate((points, np.ones_like(points[:, :1])), axis=-1).T
    p_3d_cam = np.matmul(view_matrix, p_3d)
    p_2d_proj = np.matmul(projection_matrix, p_3d_cam)
    # Project the points
    p_2d_ndc = p_2d_proj[:-1, :] / p_2d_proj[-1, :]
    p_2d_ndc = p_2d_ndc.T

    # Convert the 2D Projected points from the normalized device coordinates to pixel values
    x = p_2d_ndc[:, 1]
    y = p_2d_ndc[:, 0]
    pixels = np.copy(p_2d_ndc)
    pixels[:, 0] = ((1 + x) * 0.5) * width
    pixels[:, 1] = ((1 + y) * 0.5) * height    
    pixels = pixels.astype(int)
    return pixels

Now, I want to get the depth (absolute distance, not normalized value) of the point cloud. I am wondering whether the desired depth is the p_2d_proj[-1, :] in the demo? Many thanks.

ahmadyan commented 10 months ago

Using View*Projection is mathematically equivalent to K@P. You can take a look at this article to become familiar with it: http://www.songho.ca/opengl/gl_projectionmatrix.html

There should be examples for both apporach in the repo, for example PTAL: notebooks/objectron-3dprojection-hub-tutorial.ipynb