chenhsuanlin / bundle-adjusting-NeRF

BARF: Bundle-Adjusting Neural Radiance Fields 🤮 (ICCV 2021 oral)
MIT License
788 stars 114 forks source link

Question about camera pose transformation for LLFF #5

Closed t2kasa closed 2 years ago

t2kasa commented 3 years ago

Hi, Chen-Hsuan Lin. Thank you for sharing the great work!

I have been reading the code and I did not understand very well about camera pose transformation when calling __getitem__ method for LLFF dataset: https://github.com/chenhsuanlin/bundle-adjusting-NeRF/blob/main/data/llff.py#L104

In my understanding, camera pose in returned values from parse_cameras_and_bounds is camera-to-world matrix and its coordinate system is [right, up, backwards]. https://github.com/chenhsuanlin/bundle-adjusting-NeRF/blob/main/data/llff.py#L42

Then, the camera pose is transformed by parse_raw_camera when calling __getitem__, but I could not follow what the transformation did: https://github.com/chenhsuanlin/bundle-adjusting-NeRF/blob/main/data/llff.py#L104 Could you please let me know?

chenhsuanlin commented 3 years ago

Hi @t2kasa, parse_raw_camera() aims to convert the camera information to an extrinsic camera matrix. parse_cameras_and_bounds() was taken from the data loading function from the original NeRF, so yes, the coordinate system of this function output would be [right, up, backwards] as you said. The differences are:

  1. The matrix diag(1,-1,-1) flips the coordinate system to the conventional form of [right, down, forwards].
  2. The original output is a camera-to-world matrix (see here), whereas it's world-to-camera here, hence the inverse.

Therefore, throughout the codebase we operate all image <--> camera <--> world transformations using the camera projection equation with intrinsics K and extrinsics [R|t]: u = K(Rx+t) (see these camera helper functions).

Hope this helps!

t2kasa commented 3 years ago

Thank you your help!

I understand L106-L107 transform the coordinate system from [right, up, backwards] to [right, down, forwards], then from camera-to-world matrix to world-to-camera matrix. https://github.com/chenhsuanlin/bundle-adjusting-NeRF/blob/main/data/llff.py#L106-L107

So, does L108 back the coordinate system to [right, up, backwards] as world-to-camera matrix? https://github.com/chenhsuanlin/bundle-adjusting-NeRF/blob/main/data/llff.py#L108

chenhsuanlin commented 2 years ago

Hi @t2kasa, I may need to dig back into the original NeRF repo to double-check their pose format and how I preprocessed it back then, but I don't really have the cycles for it at this moment. However, what I can tell you is that the output of parse_raw_camera() function is guaranteed to be in the standard extrinsic matrix format of [right, down, forwards]. Sorry for the confusion!

t2kasa commented 2 years ago

Thank you. It is helpful and enough to know that the output of parse_raw_camera() is the extrinsic matrix of [right, down, forwards]. I will try to track the transformations again later.

Thank you again!