Projecting detected planes into image coordinates

google-research-datasets / Objectron

Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box describes the object’s position, orientation, and dimensions. The dataset contains about 15K annotated video clips and 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes

Other

2.24k stars 263 forks source link

Hi, thanks for the cool dataset!

I have been tinkering with objectron-geometry-tutorial.ipynb, exploring the available meta-data. I haven't been able to successfully transform the extracted planes into image space for visualization. I tried using the same procedure by which the bounding box coordinates are projected into image pixels, but that doesn't seem to have worked since I have many unreasonable values, e.g. values that are negative or much larger than image bounds.

Here's the code that I used:

plane_points = np.array([[v.x,v.y,v.z,1] for v in plane.geometry.vertices])
plane_points_3d_world = transform @ plane_points.T
plane_points_3d_cam = frame_view_matrix @ plane_points_3d_world
plane_points_2d_proj = frame_projection_matrix @ plane_points_3d_cam

plane_points2d_ndc = plane_points_2d_proj[:-1, :] / plane_points_2d_proj[-1, :]
plane_points2d_ndc = plane_points2d_ndc.T

x = plane_points2d_ndc[:, 1]
y = plane_points2d_ndc[:, 0]
plane_points2d = np.copy(plane_points2d_ndc)
plane_points2d[:, 0] = ((1 + x) * 0.5) * width
plane_points2d[:, 1] = ((1 + y) * 0.5) * height

plane_points2d = np.round(plane_points2d).astype(np.int32)
for point_id in range(plane_points2d.shape[0]):
    cv2.circle(image, (plane_points2d[point_id, 0], plane_points2d[point_id, 1]), 25, (0, 255, 255), -1)

Also, there's a small bug in the notebook in the definition of grab_frame. The line

current_frame = np.frombuffer(
        pipe.stdout.read(frame_size), dtype='uint8').reshape(width, height, 3)

has width and height transposed.

Thanks for any help you can provide!

google-research-datasets / Objectron

Projecting detected planes into image coordinates #11