DyCheckCamera conventions

VC86 commented 3 days ago

Hello,

I was trying to load the DyCheck dataset in another framework and stumbled upon your fixes to a DyCheckCamera class.

I seem to be understanding that you assume OpenCV conventions everywhere after you, for example, call cam.extrin and obtain the extrinsics matrix (according to comments world-to-camera).

However, when you call the extrin property note that you also return the translation as -orientation @ position, implying that the position is in camera-to-world and you're taking the inverse of the stored position as $-R^\top t$ -- and the orientation is coherently assumed to be stored as camera-to-world, so that $R^\top$ is its inverse.

Have you confirmed this system is consistent with the expected trajectory, e.g., on the paper-windmill example from the iphone dataset in dycheck-release?

Xiaoming-Zhao commented 2 days ago

Hi,

Do you mind giving a specific example and pointing me to specific lines?

However, in general, I am not sure whether the following is correct:

However, when you call the extrin property note that you also return the translation as -orientation @ position, implying that the position is in camera-to-world and you're taking the inverse of the stored position as $-R^\top t$ -- and the orientation is coherently assumed to be stored as camera-to-world, so that $R^\top$ is its inverse.

If the extrinsic, assuming [R | t] (this is a 3 x 4 matrix and I ignore the last row of [0, 0, 0, 1] but hope you get it) is camera-to-world (usually this is called pose instead of extrinsic): then the camera's position in the world coordinate system will just be t. You can verify this by multiplying this matrix with [0, 0, 0, 1]^T (homogeneous coordinate) since the camera's position in the camera coordinate system is essentially the origin in camera's coordinate system: [R | t] [0, 0, 0, 1]^T = t.
If the extrinsic, assuming [R | t] again is world-to-camera, then the camera's position is indeed -R^T t, you can verify by solving [R | t] [X | 1]^T= R X + t = [0, 0, 0]^T => X = - R^T t because you want to find the X in the world coordinate system such that after the transformation, it is the origin in the camera coordinate system.

Have you confirmed this system is consistent with the expected trajectory, e.g., on the paper-windmill example from the iphone dataset in dycheck-release?

I verified all camera's definition. Otherwise, I cannot render images aligned with the ground-truth.

Hope this helps.

VC86 commented 9 hours ago

Hello @Xiaoming-Zhao,

thanks for your feedback. The root issue of the confusion is that I did not find sufficient documentation on the conventions adopted by DyCheck, your part is crystal clear.

For me camera pose and camera extrinsics are synonyms, I'm not sure there is a so clear semantic difference between them (I was therefore puzzled when I saw the operation in the code, I would have probably written the properties as c2w and w2c, but this also works 👍

We fully agree that in c2w multiplying by $[0, 0, 0, 1]^\top$ yields, for example, the position of the camera in world coordinates, this is straightforward. What was unclear to me was really only related to what is stored in the JSON files of the DyCheck source dataset in terms of orientation and position, and their conventions. But I sorted out the conventions on my side (also thanks to reading through your dataloader) and the issue is now solved.

apple / ml-pgdvs

DyCheckCamera conventions #3