Closed wave-transmitter closed 1 year ago
Hi @wave-transmitter,
Thanks for your interests. Could you give more information about the camera models you are trying to do. If you are still using a typical pin-hole like camera, changing the fx,fy, cx and cy accoding to your case should be fine.
If you are using different models like omni-image, fish-eye or event-camera, probably you could refer to recent paper like: https://cyhsu14.github.io/OmniNeRF/, https://arxiv.org/abs/2206.11896 and https://4dqv.mpi-inf.mpg.de/EventNeRF/.
Hi again,
I am somehow confused to be honest among the camera model that actually captured the training data and the simulated camera model deployed in your code. How are these two models correlated? Assuming that we have a set of images captured via a real-life camera and the corresponding camera poses, should this camera implemented in your code, similarly to set_params_replica
function or the assumption of pin-hole camera is enough to train the semantic-nerf?
Let me share some extra thoughts.
In case of replica dataset you mention that the images are captured via habitat-sim, where a pin-hole camera is implemented to acquire 640 X 480 images. Based on this resolution and camera model the traj_w_c.txt
is also provided via habitat-sim. However, before training semantic-nerf the acquired images are rescaled to 320 X 240 and fx, fy parameters of the supposed pin-hole camera are calculated based on this resolution. Doesn't this affect the accuracy of camera poses since there are calculated based on a camera model of 640 X 480 resolution?
Assuming that we have a custom dataset acquired via a real-life camera and via a tool like nerfstudio we extract the corresponding camera poses with the corresponding parameters fx, fy, etc. In case that we follow the original pipeline and the input images are rescaled to 320 X 240, what should we do with the rest parameters, like fx, cx, etc? Should they be changed according to the initial estimations or stick to calculations of the code, i.e. Line62 - Line68 in trainer.py
? What if there are additional intrinsic parameters, such as distortion parameters k and p?
Hi @wave-transmitter, camera intrinsics of the data during capturin as well as NeRF training in principle should be the same, which is what happens in our code base for Replica and ScanNet. The pin-hole camera intrinsics used in our code for either Replica or ScanNet are consistent during their respective learing process.
If you capture new data via a new camera device or from another source of dataset, then you need to modify/adjust the camera intrinsics.
For all above discussion I only talk about $f_x$, $f_y$, $c_x$ and $c_y$ without distortion parameters as I assume images are already un-distorted in advance.
For your question,
If you are interested in learning calibration and NeRF model, then SC-NeRF may be what you need.
Hi @Harry-Zhi,
thanks for the clear explanation and the additional material!
Hey,
once again congrats for the amazing work.
I am trying to train the semantic-nerf in a custom dataset. I have pre-processed the dataset to meet the format of replica dataset, classes are also familiar, so I have managed to train the model without tuning the data loader (
replica_datasets.py
). Yet the model cannot learn the 3D representation due to camera poses incompatibility.In specific, as far as I can understand from
trainer.py
andset_params_replica
function, the rgb images are supposed to be captured via a pin-hole camera. In my case, the camera model is different so I want to modify the function so as to process the camera poses of thetraj_w_c.txt
file correctly. Is it enough changing fx, fy, etc. parameters? What if there is a more complex camera model with additional parameters such as p?Any tips or ideas on how to implement a different camera model than pin-hole are more than welcome!