Conversion World2Camera R, t, s to poses

angusev commented 2 years ago

First and foremost, thank you for your excellent work!

My goal is to apply your approach to real-world pictures of an object. All of the frames are properly masked so I provided your pipeline with a precise alpha channel. Besides that, there are accurate World2Camera transformations (Rotation R, translation t and scale s) for each frame. Thus, once we assume that camera is located in origin, my rough origin-centered approximation of an object's geometry with vertices V_w can be aligned in a frame in the next way:

V_c = V_w @ R * s + t

To reverse transformation to Camera2World, we can compute reverse rotations and translations (since R is orthogonal):

V_w = (V_c - t) / s @ R.T
R_rev = R.T / s
t_rev = - t @ R.T / s

Thus, I tried to adapt my transforms to the repo's code by constituting extrinsic matrix mv in the method _parse_frame https://github.com/NVlabs/nvdiffrec/blob/3e7007ca0f504008e89eb9a46907cf39ed166117/dataset/dataset_llff.py#L87 in the following manner:

"""
mv = (R_rev | t_rev.T)
     (0 0 0 | 1      )
"""
mv = torch.cat((R_rev, t_rev.T), 1)
mv = torch.cat(
    (
        mv, 
        torch.tensor([[0, 0, 0, 1]])
    ), 
0)
campos = t_rev

This approach didn't work and optimised geometry didn't even appear in frames' renders. I'd like to ask you if I understood correctly the meanings of mv and campos and my conversion from World2Camera to Camera2World space is fair. Thanks!

jmunkberg commented 2 years ago

Thanks for your kind words!

The coordinate transforms are always tricky, and different datasets have different conventions.

Here are some high level comments.

We use nvdiffrast for differentiable rasterization, which uses OpenGL conventions, as discussed here: https://nvlabs.github.io/nvdiffrast/#coordinate-systems

Here are a few slides illustrating the OpenGL mv and projection setup. https://fileadmin.cs.lth.se/cs/Education/EDA221/lectures/latest/Lecture5_web.pdf Note that in OpenGL, the camera looks along the negative z axis.

Here is a simple example of how the model view matrix is setup in our code: https://github.com/NVlabs/nvdiffrec/blob/main/dataset/dataset_mesh.py#L55

jmunkberg commented 2 years ago

You could also look at InstantNGPs colmap2nerf script: https://github.com/NVlabs/instant-ngp/blob/master/scripts/colmap2nerf.py

which should generate transform matrices compatible with our nerf dataset reader: https://github.com/NVlabs/nvdiffrec/blob/main/dataset/dataset_nerf.py

It may not work 100% out of the box (image path etc), but the transform matrices should be compatible at least.

NVlabs / nvdiffrec

Conversion World2Camera R, t, s to poses #54