We used RGB-D input to train the depth completion network. For a different dataset the training data needs to be loaded, similar to scannet_dataset.py. The network is trained on sparse depth that is sampled from the dense depth map and perturbed by Gaussian noise. For the sampling and perturbation it makes sense to consider characteristics of the sparse depth data that you plan to optimize NeRF on:
Sampling:
If you plan to optimize NeRF on scenes, where you run COLMAP to obtain poses and sparse depth, you can use the COLMAP feature extractor to determine sampling locations for depth completion training that are similar to the sparse depth distribution that you expect at test time (see precompute sampling locations).
You may also uniformly sample the sparse depth for training, if that better matches your sparse depth input for NeRF optimization.
The option missing_depth_percent controls the density of the sampled sparse depth map, e.g. 0.99 means the sparse depth will have 1% valid points.
Depth pertubation:
Depth pertubations should be added to the sampled sparse depth, if the sparse depth that you will use for NeRF is less accurate (e.g. from SfM) than the depth used for depth priors training (e.g. RGB-D from a depth sensor).
We determined the error of SfM reconstructions on ScanNet and fit a quadratic function, so that the standard deviation of the added Gaussian noise increases with the distance from the camera (see error_sources.py). It makes sense to adapt these error assumptions to the dataset.
Of course, if you have a large set of SfM reconstructions with corresponding dense depth maps (maybe MegaDepth dataset), you could also train on SfM sparse depth directly, instead of sampling and perturbing RGB-D data.
The subdirectory rgb contains rgb images and depth contains the sparse depth maps (e.g. rendered from a SfM reconstruction). target_depth contains ground truth depth (e.g. sensor depth maps), which is not needed for optimizing NeRF and only used for evaluation purposes, like depth metrics. If you do not have ground truth depth for the scene, it may be necessary to comment the respective lines.
transforms_train.json and transforms_test.json contain the following information:
frames:
file_path: relative path to the rgb file
depth_file_path: relative path to the corresponding sparse depth file
fx, fy, cx, cy: camera intrinsics; the image origin is in bottom left corner
transform_matrix: transformation from camera to world frame; the camera frame has z pointing in negative viewing direction
near: near plane for rendering
far: far plane for rendering
depth_scaling_factor: conversion factor from distance units to integer values from the depth .png file; e.g. in ScanNet a factor of 1000 converts from meters to integer values stored in the depth maps
near, far and depth_scaling_factor are scene specific parameters that need to be the same for the train and test set.
Training Depth Priors Network
We used RGB-D input to train the depth completion network. For a different dataset the training data needs to be loaded, similar to scannet_dataset.py. The network is trained on sparse depth that is sampled from the dense depth map and perturbed by Gaussian noise. For the sampling and perturbation it makes sense to consider characteristics of the sparse depth data that you plan to optimize NeRF on:
Sampling:
Depth pertubation:
Of course, if you have a large set of SfM reconstructions with corresponding dense depth maps (maybe MegaDepth dataset), you could also train on SfM sparse depth directly, instead of sampling and perturbing RGB-D data.
Optimizing NeRF with Dense Depth Priors
You can provide scenes for NeRF in this format:
The subdirectory
rgb
contains rgb images anddepth
contains the sparse depth maps (e.g. rendered from a SfM reconstruction).target_depth
contains ground truth depth (e.g. sensor depth maps), which is not needed for optimizing NeRF and only used for evaluation purposes, like depth metrics. If you do not have ground truth depth for the scene, it may be necessary to comment the respective lines.transforms_train.json
andtransforms_test.json
contain the following information:frames
:file_path
: relative path to the rgb filedepth_file_path
: relative path to the corresponding sparse depth filefx
,fy
,cx
,cy
: camera intrinsics; the image origin is in bottom left cornertransform_matrix
: transformation from camera to world frame; the camera frame has z pointing in negative viewing directionnear
: near plane for renderingfar
: far plane for renderingdepth_scaling_factor
: conversion factor from distance units to integer values from the depth .png file; e.g. in ScanNet a factor of 1000 converts from meters to integer values stored in the depth mapsnear
,far
anddepth_scaling_factor
are scene specific parameters that need to be the same for the train and test set.