Question about the scale retrieval process

google-research-datasets / Objectron

Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box describes the object’s position, orientation, and dimensions. The dataset contains about 15K annotated video clips and 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes

Other

2.24k stars 263 forks source link

Question about the scale retrieval process #19

Closed Uio96 closed 3 years ago

Uio96 commented 3 years ago

Hi there,

Thanks for your great work. It is really inspiring. I am curious about the scale retrieval process and I found something in your code.

  def compute_scale(self, box, plane):
    """Computes scale of the given box sitting on the plane."""
    center, normal = plane
    vertex_dots = [np.dot(vertex, normal) for vertex in box[1:]]
    vertex_dots = np.sort(vertex_dots)
    center_dot = np.dot(center, normal)
    scales = center_dot / vertex_dots[:4]
    return np.mean(scales)

https://github.com/google-research-datasets/Objectron/blob/aa667e689848aa3619e087b493ddb3b919f9e0c8/objectron/dataset/eval.py#L203

I am a little bit confused about the meaning of those steps. Could you explain it?

Thank you so much.

lzhang57 commented 3 years ago

Hi,

Those steps are used to compute the metric scale of the estimated 3D bounding box with the detected planes in AR session data.

The estimated 3D bounding box from the neural network is only up to a scale, but our ground truth is in metric scale. To compensate for this mismatch, we adjust the scale of the estimated 3D bounding box to make it sit on the same plane as the ground truth 3D bounding box, in that way we determined the metric scale of our estimations.

For more details about the ground truth and the detected planes, please refer to: https://google.github.io/mediapipe/solutions/objectron.html#obtaining-real-world-3d-training-data

Hope this helps

Uio96 commented 3 years ago

Got you. Thank you so much.