bmild / nerf

Code release for NeRF (Neural Radiance Fields)
http://tancik.com/nerf
MIT License
9.6k stars 1.34k forks source link

Question about NERF for indoor scenes? Invalid Disparity value (NAN or Inf) during training and failed training #70

Closed Harry-Zhi closed 3 years ago

Harry-Zhi commented 3 years ago

Hi,

I am trying to use NERF to learn the implicit representation of a indoor room. I generate a sequence of around 100 images with camera facing towards one end of the room (for example, the camera are located at the left side of the indoor room and look at the right side) without too much view point variation.

I personally think indoor scenes as LLFF data (the watched wall region) but with a much larger distance to the camera since all the stuff such as tables, chairs in the middle of the room is still valuble compared to that of LLFF data in which the scene is somewhat close to the camera and not much other space to estimate occuopancy between camera and the scenes

I set depth range to [0.1m, 10m] without NDC option and linspace( sample in depth range instead of disp range) .

During training, I sometimes met the numerical problems of sigular disp values and fail the training..

Does this because the NERF is not good at this setup or is there ant tips for NERF to work on indoor scenes?

bmild commented 3 years ago

Do you know which line is causing the failure? The disparity calculations are not part of the training pipeline, they're just there for visualization, so if it's a division error or something it could be safely removed or modified to catch the NaNs without changing anything about the converged result. The scenario you described sounds pretty similar to LLFF so I'd be surprised if there's anything inherently problematic about it.

Harry-Zhi commented 3 years ago

Do you know which line is causing the failure? The disparity calculations are not part of the training pipeline, they're just there for visualization, so if it's a division error or something it could be safely removed or modified to catch the NaNs without changing anything about the converged result. The scenario you described sounds pretty similar to LLFF so I'd be surprised if there's anything inherently problematic about it.

HI Bmild,

Thank you again for the prompt reply and help. I carefully debug the codes and the problem does not appear anymore..Maybe it was caused by some bad initilisation or some other problems in my implementation. Now it works well in my set-up.

I agree that the indoor scenes looks somewhat related to LLFF setup in my described situation, however, is it still better to choose non-NDC setup in this case?

I am wondering, especially in a more general situation for indoor scenes, with a camera scanning the whole room following certain smooth trajectory (overall the camera will scan the whole scene facing inward and outward from time to time), it is not guaranteed to have all the indoor scene behind certain plane as LLFF data does, so we have to use the general non-NDC setup (the setup for lego), is it correct?

Looking forward to your reply.

bmild commented 3 years ago

In a more general situation like you described, it's definitely better to use non-NDC. However, the code is not structured to handle generally captured scenes like the one you describe, since the near/far ray sampling bounds are set globally rather than per-camera. You'd want to go in and change that code to account for this fact when the distance from camera to nearest scene point varies a lot over the input images. You could consider using linear-in-disparity sampling (lindisp flag) but then you have to be careful that the "near" bound is not too close to the camera, or you'd be wasting most of your samples.

Harry-Zhi commented 3 years ago

Thank you so much bmild!