google-research-datasets / Objectron

Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box describes the object’s position, orientation, and dimensions. The dataset contains about 15K annotated video clips and 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes
Other
2.24k stars 263 forks source link

2D bounding boxes request #54

Open yd-yin opened 3 years ago

yd-yin commented 3 years ago

As mentioned in #46, directly fitting a 2D bbox by the projected vertices of 3D bbox can be very inaccurate. For example, the actual 2d bbox in green and the fitted one in red.

Is there any better way to get the 2d bbox?

image

ahmadyan commented 2 years ago

We did not annotate the 2D bounding boxes.

My 0.02, you can always run an off-the-shelf 2D object detector (like EfficientNet or Yolo) and get the 2D bounding box and check if it is within the 'over-sized' crop we get from the 3D bounding box.