bounds and downsampling factor for load_llff_data_multi_view

andrewsonga commented 2 years ago

First of all, thank you for releasing your impactful work! I'm trying to train NRNeRF on multi-view data from 8 synchronized cameras with known intrinsics and extrinsics, and I ran into a couple questions regarding the bounds and the downsampling factor.

1. Are the parameters min_bound and max_bound defined as the minimum and maximum across all cameras?

I noticed that in the README.md, there is a single min_bound and max_bound that is shared between all cameras when specifying calibration.json, as opposed to there being one for each camera.

2. When using load_llff_data_multi_view, if our training images are downsampled from their original resolution by a certain factor, are there any parts of the calibration.json (i.e. camera intrinsics / extrinsics) we have to accordingly adjust to account for the downsampling factor?

I'm asking this question because that downsampling images by a factor is not implemented in load_llff_data_multi_view, but load_llff_data appears to be using factor in a couple of cases (https://github.com/yenchenlin/nerf-pytorch/blob/a15fd7cb363e93f933012fd1f1ad5395302f63a4/load_llff.py#L76, https://github.com/yenchenlin/nerf-pytorch/blob/a15fd7cb363e93f933012fd1f1ad5395302f63a4/load_llff.py#L103).

Thank you in advance for reading this long question. I look forward to reading your response.

edgar-tr commented 2 years ago

Thank you for your kind words!1. Yes, min_bound and max_bound are the same across all cameras. They are in world units. For multi-view, there is no heuristic implemented to determine good values, instead you'd have to come up with your own heuristic (e.g. for a spherical camera setup, the maximum distance between any two cameras might be a good first guess).2. You can first try to just set factor=4 for example and the code at https://github.com/facebookresearch/nonrigid_nerf/blob/main/train.py#L1354 will take care of adjusting the calibration (namely focal and center of the intrinsics). Extrinsics don't need to be adjusted. If that doesn't work, store the correct (downsampled) values for focal and center in calibration.json and use factor=1.Hope that helps!

andrewsonga commented 2 years ago

Thank you for the swift response! I have just a few more follow-up questions:

1. Do we have to adjust min_bound and max_bound according to the downsampling factor?

2. Do you think using min_bounds and max_bounds in the poses_bounds.npy file generated by running colmap as follows (https://colmap.github.io/faq.html#reconstruct-sparse-dense-model-from-known-camera-poses) constitutes a good heuristic for multi-view?

I ran colmap on multi-view images from a single timestep to estimate the 3D points, and used the 1% and 99% percentile depth values to define the min_bounds and max_bounds for each camera; the shared min_bound and max_bound then would become the minimum and maximum, respectively, across all caermas.

3. Where is min_bound and max_bound used? is it used as the integration bounds for volume rendering?

4. If so, what is the harm of heuristically setting min_bound as 0 and max_bound as a very large number?

edgar-tr commented 2 years ago

Yes, that's how min and max_bound are also obtained in the monocular setting, seems like a very reasonable heuristic to me if you can get colmap to run on all your images. If you cannot use all images, make sure that the images from the single time step still cover the full depth of the scene (I'd think that that is the case usually).2. Yes, they are the near and far plane distance for volume rendering.3. Setting the near plane to 0 might lead to artifacts because the nerf can place artifacts right in front of the camera that are practically not visible from other cameras. It's an okay-ish heuristic if there's no better alternative. The far plane distance should not be super large because the 64 coarse samples along the ray are evenly spaced along the ray, which leads to very large distances between samples along the ray if the far plane is not reasonable.On Nov 27, 2021 23:01, Chonghyuk Song @.***> wrote: Thank you for the swift response! I have just a few more follow-up questions:
Do you think using min_bounds and max_bounds in the poses_bounds.npy file generated by running colmap as follows (https://colmap.github.io/faq.html#reconstruct-sparse-dense-model-from-known-camera-poses) constitutes a good heuristic for multi-view? I ran colmap on multi-view images from a single timestep to estimate the 3D points, and used the 1% and 99% percentile depth values to define the min_bounds and max_bounds for each camera; the shared min_bound and max_bound then would become the minimum and maximum, respectively, across all caermas.
Where is min_bound and max_bound used? is it used as the integration bounds for volume rendering?
If so, what is the harm of heuristically setting min_bound as 0 and max_bound as a very large number?

—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.Triage notifications on the go with GitHub Mobile for iOS or Android.

andrewsonga commented 2 years ago

Thank you for the detailed response! I heeded your instructions carefully, but my renderings are coming out super weirdly and I can't seem to figure out why. The following are the first five renderings for --camera_path spiral:

The first frame of my multi-view video look like this:

Are there any modifications I need to make to free_viewpoint_rendering.py in order to make it work for multi-view datasets? For instance, do we have to change load_llff_data to load_llff_data_multi_view in free_viewpoint_rendering.py as well as train.py?

edgar-tr commented 2 years ago

I have never tried running the multi-view code with rendering. The spiral code might be too sensitive, you could try the static or input reconstruction rendering. Changing to load_llff_data_multi_view sounds reasonable, but again, I have not tried that part.

facebookresearch / nonrigid_nerf

bounds and downsampling factor for load_llff_data_multi_view #11