Intrinsic camera matrices are not compatible with the 640x480 crops

google-research-datasets / Objectron

Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box describes the object’s position, orientation, and dimensions. The dataset contains about 15K annotated video clips and 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes

Other

2.24k stars 263 forks source link

I use the processed data in tf.SequenceExample format, and all the intrinsic camera matrices I load look a bit like this:

I assume the matrix is compatible with the Hartley and Zisserman definition:

Here, the values seem to be pixel-scale (which makes sense), but the principal point offsets are way beyond the image bounds (makes no sense). I assume this is because that intrinsic matrix corresponds to the pre-cropped image (i.e. before the dataset is normalized to 640x480).

I tried to decompose the projection matrix to get the intrinsic matrix back, but did not get a sensical result. Is this a bug, or am I misinterpreting the content of that matrix?

I'm currently trying to find a way to get the new intrinsics from the data, but the only way I can now think of is to recalibrate using the provided 2D/3D correspondences. Would there be a simpler way?

google-research-datasets / Objectron

Intrinsic camera matrices are not compatible with the 640x480 crops #36