google-research-datasets / Objectron

Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box describes the object’s position, orientation, and dimensions. The dataset contains about 15K annotated video clips and 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes
Other
2.24k stars 263 forks source link

Intrinsic camera matrices are not compatible with the 640x480 crops #36

Closed plstcharles closed 3 years ago

plstcharles commented 3 years ago

I use the processed data in tf.SequenceExample format, and all the intrinsic camera matrices I load look a bit like this: image

I assume the matrix is compatible with the Hartley and Zisserman definition: image

Here, the values seem to be pixel-scale (which makes sense), but the principal point offsets are way beyond the image bounds (makes no sense). I assume this is because that intrinsic matrix corresponds to the pre-cropped image (i.e. before the dataset is normalized to 640x480).

I tried to decompose the projection matrix to get the intrinsic matrix back, but did not get a sensical result. Is this a bug, or am I misinterpreting the content of that matrix?

I'm currently trying to find a way to get the new intrinsics from the data, but the only way I can now think of is to recalibrate using the provided 2D/3D correspondences. Would there be a simpler way?

ahmadyan commented 3 years ago

You are correct. The K matrix is from the original pre-cropped frame from the video, which has 1920x1440 resolution (thus principal point is almost in the middle), and was down-scaled to 640x480 by cv2::resize. You can adjust the matrix accordingly or re-scale the image.

plstcharles commented 3 years ago

Thanks for the info! If the original video resolutions are indeed all the same (1920x1440), you are right, I can rescale and use the same matrix, so that solves my problem. :+1: