Coordinate system of translation

dnlwbr commented 2 years ago

Hello and thanks for sharing this nice work! I am still a bit confused by coordinate systems and the Readme. While the Readme says that

In run_nerf.py and all other code, we use the same pose coordinate system as in OpenGL

it seems that the LLFF code which is used by image2poses.py only transforms the rotation matrix from [right, down, forward] (COLMAP) to [down, right, backward] and NeRF later converts to [right, up, backward] (OpenGL).

If I create poses_bounds.npy with own poses as suggestes in the Readme (OpenGL format), then this line would still apply the conversion to OpenGL format. That doesn't seem right to me, or am I misunderstanding something?
What about the translation? I can't find any conversions in this regard, so it seems that the translation part in poses_bounds.npy should still be in COLMAP format. Is that correct?
Why is the intermediate step ([d,r,b]) used in the first place and the rotation not directly converted to the OpenGL format?

Thanks in advance!

NagabhushanSN95 commented 2 years ago

@dnlwbr did you find anything related to your above questions?

dnlwbr commented 2 years ago

Unfortunately, it's been some time since I've dealt with that and the whole thing is also somewhat ambiguous. If I remember correctly, however, these were my findings:

I think the documentation is not quite correct or imprecise here. NeRF uses the OpenGL format, but there is a conversion at the beginning. I think you have to create poses_bounds.npy in drb format, because this line still does the transformation to OpenGL format. If the code was already in OpenGL format from the beginning, as recommended by the Readme, then the mentioned line would break everything. At least my results seem to have confirmed my assumption at that time.
I think I was confused here, because the LLFF readme that the NeRF readme refers to at this point only mentions rotation. Nevertheless, I think the coordinate systems always refer to both translation and rotation. Everything else would have been somehow strange. Still, the explicit mention of the rotation matrix is somewhat misleading, in my opinion, if the complete pose (translation + rotation) is meant.
I'm still wondering about that.

However, my statements are to be enjoyed with caution, since I am everything else than an expert.

NagabhushanSN95 commented 2 years ago

Thanks @dnlwbr! This is helpful. I'll update here if I figure out anything more.

NagabhushanSN95 commented 2 years ago

@dnlwbr I figured out what is happening w.r.t. your point 2 (Convention for translation isn't changed). You were right. Convention of both rotation and translation are changed. For the benefit of others who may also stumble upon this issue, I'll note down what I've understood so far.

It's a neat trick they've used. When converting a rotation matrix R from one convention to other, we find the corresponding permutation matrix P and find the new rotation matrix as P' R P. Similarly we find the new translation as P t. But when we compute relative poses, right multiplication by P is unnecessary (since it cancels out). So, we can multiply only on the left by P'. Here, they first take inverse of the camera poses then multiply rotation matrix with the permutation matrix on the right with P. Then they compute relative pose and take the inverse of this relative pose. When we invert the pose, rotation matrix is transposed so that R' P becomes P' R which is what we want. And the translation becomes -(R' P)' (-R' t) which is equal to P' t. Thus both rotation and translation are converted to the final convention.
I think they simply wanted to reuse LLFF code to get colmap poses to avoid duplicating it here. And then they convert to Open GL convention.

bmild / nerf

Coordinate system of translation #125