google-research-datasets / Objectron

Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box describes the object’s position, orientation, and dimensions. The dataset contains about 15K annotated video clips and 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes
Other
2.24k stars 263 forks source link

Loss function details #38

Closed hlaks-adl closed 3 years ago

hlaks-adl commented 3 years ago

According to Table 3 in the Objectron paper, the loss function used in the two-stage pipeline is "Per vertex MSE normalized on diagonal edge length". I am trying to understand the different parts in this sentence. Could you share the equation or pseudo-code corresponding to this? It will help make the computation explicit. Thanks a lot!

ahmadyan commented 3 years ago

Hi, What we mean is compute the sum of mean-squared loss of each vertex, and then normalize it by the diameter of the box.

dist = self._mean_squared_error_per_landmark(gt, pred) return dist / (self._length_per_instance(gt) + keras_backend.epsilon())

Here _length_per_instance means the distance between two particular pre-selected vertices of the box (the box's diagonal).

hlaks-adl commented 3 years ago

Hi, thanks for the info!