Question about evaluation results

google-research-datasets / Objectron

Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box describes the object’s position, orientation, and dimensions. The dataset contains about 15K annotated video clips and 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes

Other

2.24k stars 263 forks source link

Question about evaluation results #57

Closed Alen-Wong closed 2 years ago

Alen-Wong commented 2 years ago

Hi! I get predicted 3d bounding boxs following https://google.github.io/mediapipe/solutions/objectron.html, and I can also read gt 3d bouding box( by features.FEATURE_NAMES["POINT_3D"]) from tf record file following https://github.com/google-research-datasets/Objectron/blob/master/notebooks/Hello%20World.ipynb. However, I get a small 3d iou value when evaluating 3d iou metric.

I am not sure if there is anything wrong in my evaluation procedure, and can you provide evaluation results of each sequence?

I am looking forward to your reply. Thank you so much.

DeriZSY commented 2 years ago

Also, we would like to have the original predictions for each sequence that would yield the same results as presented on the paper if possible.

lzhang57 commented 2 years ago

Hi @Alen-Wong, there are two potential causes:

(1) To predict correct 3d bounding boxes, your need to provide accurate camera intrinsics. When you used https://google.github.io/mediapipe/solutions/objectron.html to predict the 3d bounding boxes, did you passed your camera intrinsics (FOCAL_LENGTH, PRINCIPAL_POINT) as part of inputs?

(2) The predicted 3d bounding box is only up to scale, you will have to re-scale your box using the ground planes as the instruction here: https://github.com/google-research-datasets/Objectron/blob/master/objectron/dataset/eval.py#L158.

DeriZSY commented 2 years ago

@lzhang57 Hi, we are evaluating it exactly on the Objectron dataset, and we are using apis provided in the mediapipe. However, the results is still far from the reported results on the paper. Is it possible to obtain either: 1) A exact protocol where we can reproduce the results reported in the paper Or: 2) The original predictions for each test sequence as reported in the paper?

lzhang57 commented 2 years ago

Hi DeriZSY,

Could you provide more details on how the results is still far from the reported results on the paper? Some sampled prediction results would help.

Thanks, Liangkai

ahmadyan commented 2 years ago

If you'd like to re-produce the numbers in the paper, download the original models (not the Mediapipe models/apis) from gs://objectron/model, and run the eval script on the corresponding eval set. That would also give you the original predictions as well. Our numbers have been re-produced by a few other papers independently.

you can download the models from the objectron bucket on gcs, at the objectron/models example (if you have gsutil, requires authentication): gsutil ls gs://objectron/model gsutil cp -r gs://objectron/model local_dataset_dir

or directly via http: https://storage.googleapis.com/objectron/models/objectron_mesh2_cvpr/book.hdf5, etc.