NVlabs / instant-ngp

Instant neural graphics primitives: lightning fast NeRF and more
https://nvlabs.github.io/instant-ngp
Other
15.91k stars 1.91k forks source link

Poor performance on my own large dataset #257

Closed qhdqhd closed 2 years ago

qhdqhd commented 2 years ago

My dataset contains about 100 views, the viewpoints form a ring, all viewpoints are looking towards the circle center. The cameras are like this:

FQ)1(4{OMH%W`6G 9Q)T4%E

The scene size is about 15 meters.

The first row shows 5 rendered views of the training view. The second row shows 5 training views.

RGMFN9C1LTJDA8${FOW770

My parameter: aabb_scale is set to 16 and I change the scale or offset in the camera positions as https://github.com/NVlabs/instant-ngp/blob/master/docs/nerf_dataset_tips.md, so the scene has been fully covered by the box.

The people in the center of the scene renders fine, but there are quite a few white floating objects or ghosting around the people. I would like to ask why? There are three possible reasons I can think of:

  1. Is it because the scene has too few textures? Lots of black backgrounds around.
  2. Or is it due to camera parameters? Because I used 100 different cameras here, but the cameras are all 12mm, and the intrinsic parameters are very close, but not exactly the same. Since your project does not support multiple camera intrinsics, only one common intrinsic is used.
  3. Or the camera shooting position? My cameras are all on a plane without views from different directions like looking down and looking up.
Tom94 commented 2 years ago

Hi there, first of call: cool setup!

(1) shouldn't be a problem as long as the camera poses are precise enough for the Train extrinsics option to fix any slight inaccuracies.

I suspect the problem comes more from (2) and (3).

For (2), we recently (yesterday) pushed support for per-camera metadata. You can customize it via the python bindings

testbed.nerf.training.dataset.metadata[image_id].camera_distortion = ...
testbed.nerf.training.dataset.metadata[image_id].focal_length = ...
testbed.nerf.training.dataset.metadata[image_id].principal_point = ...

More specifics are in python_api.cu. The .json-based loader unfortunately only supports a single set of camera parameters per json file.

For (3), this will largely manifest as artifacts when trying to view the scene from the top of bottom (i.e. outside of the convex hull of the training data). If you plan for the viewpoint to stay close to the ring of cameras, you should be fine.

Curious to hear whether you find the Python bindings w.r.t. (2) helpful.

Cheers!