apple / ml-pgdvs

[ICLR 2024] Official implementation of "Pseudo-Generalized Dynamic View Synthesis from a Video"
https://xiaoming-zhao.github.io/projects/pgdvs/
Other
12 stars 0 forks source link

DyCheckCamera conventions #3

Closed VC86 closed 1 day ago

VC86 commented 3 days ago

Hello,

I was trying to load the DyCheck dataset in another framework and stumbled upon your fixes to a DyCheckCamera class.

I seem to be understanding that you assume OpenCV conventions everywhere after you, for example, call cam.extrin and obtain the extrinsics matrix (according to comments world-to-camera).

However, when you call the extrin property note that you also return the translation as -orientation @ position, implying that the position is in camera-to-world and you're taking the inverse of the stored position as $-R^\top t$ -- and the orientation is coherently assumed to be stored as camera-to-world, so that $R^\top$ is its inverse.

Have you confirmed this system is consistent with the expected trajectory, e.g., on the paper-windmill example from the iphone dataset in dycheck-release?

Xiaoming-Zhao commented 2 days ago

Hi,

Do you mind giving a specific example and pointing me to specific lines?

However, in general, I am not sure whether the following is correct:

However, when you call the extrin property note that you also return the translation as -orientation @ position, implying that the position is in camera-to-world and you're taking the inverse of the stored position as $-R^\top t$ -- and the orientation is coherently assumed to be stored as camera-to-world, so that $R^\top$ is its inverse.

Have you confirmed this system is consistent with the expected trajectory, e.g., on the paper-windmill example from the iphone dataset in dycheck-release?

I verified all camera's definition. Otherwise, I cannot render images aligned with the ground-truth.

Hope this helps.

VC86 commented 9 hours ago

Hello @Xiaoming-Zhao,

thanks for your feedback. The root issue of the confusion is that I did not find sufficient documentation on the conventions adopted by DyCheck, your part is crystal clear.

For me camera pose and camera extrinsics are synonyms, I'm not sure there is a so clear semantic difference between them (I was therefore puzzled when I saw the operation in the code, I would have probably written the properties as c2w and w2c, but this also works 👍

We fully agree that in c2w multiplying by $[0, 0, 0, 1]^\top$ yields, for example, the position of the camera in world coordinates, this is straightforward. What was unclear to me was really only related to what is stored in the JSON files of the DyCheck source dataset in terms of orientation and position, and their conventions. But I sorted out the conventions on my side (also thanks to reading through your dataloader) and the issue is now solved.