"applied_transform" in camera json file

Trainingzy commented 8 months ago

Thanks a lot for your great work!

There is a key called "applied_transform" in the camera json file. I wonder what this denotes.

ShengCN commented 8 months ago

We use the same format with nerfstudio. Take a look at this: https://github.com/nerfstudio-project/nerfstudio/issues/1708.

In short, it transforms the space from OpenCV space to OpenGL space.

Ir1d commented 8 months ago

Hi @ShengCN , sorry I'm still a bit confused. Does it mean the "transform_matrix" in jsons are in opencv format (COLMAP pose?) I noticed that the applied transform is swapping x and y axis and flipping z.. which doesn't seem to be converting from opencv to opengl (c2w_list[:, :3, 1:3] *= -1).

ShengCN commented 8 months ago

@Ir1d @Trainingzy In my experiments, I did not deal with applied transform. But you raised a good question here. I took a look at the nerfstudio code.

As shown here (https://github.com/search?q=repo%3Anerfstudio-project%2Fnerfstudio%20applied_transform&type=code), he applied_transform is only used in several places. It seems to be a fixed matrix that will be written in the transforms.json file when nerfstudio processes the outputs from colmap. Another place they use this matrix is to transform colmap processed 3D point clouds to OpenGL space (I guess) and save the point clouds to ply file. This matrix seems to be useless in NeRF related code, but is only used in this case.

@Ir1d, the transform-matrix is in OpenGL space that is consistent with blender (+Z = back, +Y = up, +X = right). See this: https://github.com/nerfstudio-project/nerfstudio/blob/0e889f7cb1c81681dd16b1e618386902ae44434d/nerfstudio/process_data/colmap_utils.py#L437

Trainingzy commented 8 months ago

@ShengCN Thanks a lot for further clarification. From your comments and nerfstudio, my understand is the camera extrinsics provided is in OpenGL coordinate space?

In my codebase, I need to use the extrinsics in colmap coordinate space, how can I transform the provided extrinsics to the colmap version?

Do you transform the colmap cache to the extrinsics from Here?

Ir1d commented 8 months ago

Thanks for the additional information @ShengCN . I compared the transforms.json with COLMAP result from https://huggingface.co/datasets/DL3DV/DL3DV-ALL-ColmapCache In my case, I think affine @ transform_matrix is camera_to_world in opengl space. @Trainingzy If I understand correctly you can then easily convert from opengl to colmap space easily.

Trainingzy commented 8 months ago

@Ir1d Thanks a lot for answering. Yes, you are right, the transform_matrix is c2w in opengl space.

For those who need to convert the transform_matrix to opencv w2c, please refer the inverse of this.

  c2w = transform_matrix
  c2w[2, :] *= -1
  c2w = c2w[np.array([1, 0, 2, 3]), :]
  c2w[0:3, 1:3] *= -1
  w2c_opencv = np.linalg.inv(c2w)

Ir1d commented 8 months ago

Hey @Trainingzy , I actually meant that affine @ transform_matrix result in opengl space. Do you find transform_matrix alone is in opengl space?

The reason I ask is that conversion between opencv and opengl doesn't seem to require swapping axis, but only need flipping y and z.

Trainingzy commented 8 months ago

Hi @Ir1d, I get quite confused with the coordinates now... But it seems you are right.

First, the script I provide is correct to get w2c in opencv coordinates and I also process the colmap transform matrix in colmap cache to confirm it:

 c2w = transform_matrix
  c2w[2, :] *= -1
  c2w = c2w[np.array([1, 0, 2, 3]), :]
  c2w[0:3, 1:3] *= -1
  w2c_opencv = np.linalg.inv(c2w)

If I understand correctly, the first three rows of my code equals to

c2w = affine @ transform_matrix

Then the fourth row c2w[0:3, 1:3] *= -1 is to convert the opengl to opencv.

If so, affine @ transform_matrix would be getting the opengl coordinates.

yukiumi13 commented 2 days ago

Hello. I have faced the same issue and after checking the codes I would like to make some additional clarifications:

Applied transform does NOT change the camera convention (OpenCV to OpenGL) since the camera-to-world matrix is left-multiplied by applied transform. It changes the world coordinates conventions. In COLMAP, the +Z axis is not up direction. I think the applied transform here is to adjust the world coordinates system.
Applied transform has already been applied to transform matrix in transforms.json (transform = applied_transform @ original_transform). The above equation holds because the inverse of the applied transform here is itself so the original transform is recovered. https://github.com/nerfstudio-project/nerfstudio/blob/e8bf47269251b5f49c8506cd7eb0a2aa240222c4/nerfstudio/process_data/colmap_utils.py#L474-479

DL3DV-10K / Dataset

"applied_transform" in camera json file #4