Open JihyongOh opened 1 year ago
Hi! Sorry for taking a while to get to this -- I wanted to let you know that I'm working on a longer-form response to your questions (especially the first one, as I think it's relevant to everyone using the repo), and will try to follow-up within the next couple of days. I hope that's okay!
@breuckelen Hello, no problem at all, I completely understand that you might need some time due to my detailed questions. I appreciate your willingness to provide a comprehensive answer, and I'm looking forward to reading it. Take your time and thanks a lot!
As mentioned in the README, the codebase was originally extended from nerf_pl and Neural Light Fields, but at this point its structure deviates fairly significantly from that of nerf_pl. I'll try to break it down (as best I can) below.
At a high level, the optimization procedure, implemented in nlf/__init__.py
, requires
datasets/
), which produces a set of training rays, and their corresponding ground truth colorsnlf/model
), which maps a ray to a predicted colornlf/regularizers
), which implement auxiliary losses applied to the model (e.g. total variation, sparsity, etc.).The dataset, model, regularizers, and additional training hyper-parameters (like learning rates for different model parameters, optimizers, weight initialization strategy) are constructed from configurations under the conf/
folder, and can be specified via the command line. For example:
python main.py experiment/dataset=<dataset_config> \
experiment/training=<training_config> \
experiment/model=<model_config> \
experiment.dataset.collection=<scene_name> \
+experiment/regularizers/tensorf=tv_4000
Note that if a specific configuration property z
lives in the directory conf/experiment/x/y
, then you can overwrite it with experiment.x.y.z=blah
. You can also change default configuration in conf/experiment/local.yaml
.
As mentioned above, various datasets that produce training rays and colors are implemented in the datasets/
folder.
The base datasets for static scenes and dynamic scenes are Base5DDataset
and Base6DDataset
, respectively, in dataset/base.py
. These base classes are pretty bare-bones, and there is unfortunately a lot of re-implemented boilerplate in each specific subclass (for example, compare datasets/technicolor.py and datasets/neural_3d.py). In general dataset classes operate as follows during training:
read_meta
function (e.g. image file names, camera poses, the number of frames in a video sequence)prepare_training_data
function. This typically involves creating a ray for every camera, every pixel in that camera, and every time-step, as well as the colors corresponding to these rays.update_all_data
function into a single array, for more efficient loading.__get_item__
function at each training stepEach dataset class can also be used as a validation, testing, or render dataset_, by changing the split
flag. For validation/testing datasets, the above process is much the same, except that step 3 (collating per-ray inputs) is skipped, and the dataset returns each individual per-ray input in the __get_item__
function. For render datasets, the dataset creates a novel camera trajectory in prepare_render_data
, and does not return ground truth per-ray outputs (colors).
The configurations for different datasets can be found in the conf/experiment/dataset
folder. Some important config options are:
name
: this specifies the type of dataset --- in other words, the dataset class used. You can add new dataset classes by modifying the dataset_dict
in datasets/__init__.py
collection
: this specifies the particular scene used within the datasetroot_dir
: the path to the scene, which defaults to <data_root_dir>/<data_subdir>/<collection>
A model maps a ray to a predicted color. In this project, every model consists of two components:
All model configs (in conf/experiment/models
) will typically contain the following lines:
# @package _group_
type: lightfield
render:
type: lightfield
param:
fn: identity
embedding:
type: ...
color:
type: ...
Like the name
variable in the dataset configurations, the type
variable for embedding and color specify the specific model to use in nlf/embedding/__init__.py
and nlf/nets/__init__.py
. I'll discuss more about how to define / extend your own embedding and color models below.
In this project, every embedding model is a RayPointEmbedding
(defined in nlf/embedding/embedding.py
), combining a sequence of ray-dependent operations (like mapping a ray to a set of sample points), and point-dependent operations (like adding a set of point offsets to each sample point). It's very simple to compose a sequence of arbitrary operations in a model config file, with the following syntax:
embedding:
type: ray_point
embeddings:
op0:
type: ...
op1:
type: ...
...
Above, op1
, op2
, etc. can be arbitrary keys, and all of these operations are applied in sequence. The type
variable for each operation specifies which class in the embedding_dict
variable in nlf/embedding/__init__.py
to use.
The ray-dependent operations are defined in nlf/embedding/ray.py
. We use RayPredictionEmbedding
, which maps a ray (origin and direction) to a set of per-ray outputs, like parameters for geometric primitives. With configuration files it's straightforward to specify which "parameterizations" to apply to the input ray (two-plane, Pluecker, etc.), what positional encoding to apply to the ray, the type of MLP to use, the shape and name of each per-ray output. For example, the donerf model config looks like this:
ray_prediction_0:
type: ray_prediction
# Parameterization
params:
ray:
start: 0
end: 6
param:
n_dims: 6
fn: pluecker
direction_multiplier: 1.0
moment_multiplier: 1.0
pe:
type: windowed
freq_multiplier: 2.0
n_freqs: 1
wait_iters: 0
max_freq_epoch: 0
exclude_identity: False
# Net
net:
type: base
group: embedding_impl
depth: 6
hidden_channels: 256
skips: [3]
# Outputs
z_channels: 32
outputs:
z_vals:
channels: 4
sigma:
channels: 1
activation:
type: ease_value
start_value: 1.0
window_epochs: 3
wait_epochs: 0
activation:
type: sigmoid
shift: 4.0
point_sigma:
channels: 1
activation:
type: ease_value
start_value: 1.0
window_epochs: 3
wait_epochs: 1
activation:
type: sigmoid
shift: 4.0
point_offset:
channels: 3
activation:
type: tanh
outer_fac: 0.125
color_scale:
channels: 3
activation:
type: ease_value
start_value: 0.0
window_epochs: 0
wait_epochs: 0
activation:
type: identity
shift: 0.0
inner_fac: 1.0
outer_fac: 1.0
color_shift:
channels: 3
activation:
type: ease_value
start_value: 0.0
window_epochs: 0
wait_epochs: 0
activation:
type: identity
shift: 0.0
inner_fac: 1.0
outer_fac: 1.0
which applies (1) a Pluecker parameterization to the ray, (2) positional encoding with 1 frequency to the Pluecker-parameterized ray, which is fed through a (3) 6 layer, 256 hidden unit MLP, and (4) outputs geometric primitive parameters (z_vals
), point offsets (point_offset
), as well as some other things for 32 two different sample points (z_channels: 32
).
We also use RayIntersectEmbedding
, which intersects a ray with a set of geometric primitives, producing sample points for that ray. Various intersect methods are defined in the nlf/intersect/
folder. We use axis-aligned z planes and spheres in our work, but we also define intersect methods for voxel grids, non-axis-aligned planes, and a few others. You can also extend these or define your own.
The main point dependent operation that we use is PointOffsetEmbedding
in nlf/embedding/point.py
, which simply adds point offsets to each generated sample point, modulated by a set of per-sample-point weights. For dynamic scenes, we also use the AdvectPoints
embedding, which advects each sample point into the nearest keyframe using per-sample-point flows that are output by the RayPredictionEmbedding
.
Color models typically map a set of sample points (generated via an embedding model) to a color using volume rendering on some underlying volumetric scene representation. We implement a few TensoRF-based models for static and dynamic scenes in nlf/nets
, but in principle it should be easy to add your own. I am currently messing around with Instant-NGP, and some other models from nerfacc. Feel free to DM me if you're interested in using these implementations, which I haven't yet integrated into the public repo.
Because the implementation for each color model is pretty much standalone (they're not designed to be composable at the moment and are implemented independently of one another), I won't go into too much detail here. If you have any questions about our models (e.g. the keyframe-based TensoRF model that we use), or about how to implement your own, feel free to follow-up.
The regularizer classes, implemented in nlf/regularizers
, are a way for us to add auxiliary losses to the model, apart from the typical color loss (like total variation, sparsity -- or in the monocular case, perhaps monocular depth losses and flow losses). Defining a regularizer is pretty straightforward. Just extend the base regularizer class in nlf/regularizers/base.py
, implement the _loss(...)
function, and add your class to the regularizer_dict
in nlf/regularizers/__init__.py
.
In order to make the regularizer accessible via the command line, you should create a new folder in nlf/experiment/regularizers
for your specific regularizer type. You can put any number of configurations for this regularizer within this folder, and then use the regularizer by adding the following to the command line:
+experiment/regularizers/<folder_name>=<config_name>
Ideally, you won't have to modify the core training loop in nlf/__init__.py
too much. It's designed to be pretty general-purpose. However, it still might be useful to understand a couple of things about how it works:
conf/experiment/training
folder, where you can specify how to sample from your dataloader (e.g. with or without replacement), number of epochs, optimizer, learning rate, decay rate, etc.opt_group
property of a class.+experiment/regularizers/<x>=<config_name>
for each regularizer x
.If you're interested in using HyperReel for monocular sequences, this would probably require:
Base6DDataset
conf/experiments/models
, though you should be able to use any existing dynamic model configuration as a starting pointBaseRegularizer
for the monocular setting. Note that if you require something like per-ray optical flow for regularization (from an image in time t to time t+1), then you can make these flows accessible from your dataset class, by appending them to the other per-ray inputs (origins, directions, times). As an example, consider the donerf dataset in datasets/donerf.py
, which makes ground truth depth accessible (although we do not use it). I realize that this is a lot of info, and I apologize for the fact that it's a little disorganized at the moment. In terms of extending HyperReel, I recommend following the blueprint in the section above, and referencing other parts of this post as necessary. And of course please follow-up if you have any additional questions. I'll do my best to answer them in a timely manner.
Let me also try to answer your questions (2) and (4) here:
(2) You can change the val_set
parameter in conf/experiment/dataset/neural_3d.yaml
to specify which cameras to use for validation (all others will be used for training). You can also specify the number of frames to use here.
(4) the no_holdout
scripts are used to train models with every view --- we do not use these models for quantitative results, but do use them for some of the demo videos, where it doesn't necessarily make sense to holdout views (you want to use all of the data available to you for the best qualitative view synthesis results).
@breuckelen Apologies for the delayed response, as I was dealing with a personal matter. Huge thank you for your detailed guidance and explanations! This will greatly assist me in my research and studies. If I encounter any difficulties later on, I'll be sure to ask additional questions. :)
ulties later on, I'll be sure to a
@JihyongOh I wonder if you have sucessfully run the code on your monocular video dataset? how about the performance?
Hi, Thanks for sharing such awesome work and nicely organized code! I want to study this large-scale code as a baseline network (framework) in detail, and then explore ideas for casually captured monocular videos. I have a few questions as follows:
This large code seems to be implemented based on PyTorch Lightning. Was this code developed from scratch? If so, could you provide some tips/guidelines or links to help me understand the overall flow of this code in detail? Any brief explanation of an outline, a rule, or how to debug for organizing this large-scale code would greatly help me in my studies.
If I want to test HyperReel on the Neural 3D Video dataset with a monocular setting (e.g., only using camera1 among 20 cameras for 50 frames or all 300 frames at once), how can I modify a config or a YAML file associated with "scripts/run_one_n3d.sh"?
If my own monocular video (forward-facing dataset) is provided as extracted frames (.png, not video .mp4) with bose_bound.npy, how can I handle this dataset in this code structure for training HyperReel (any suggestion for referring to YAML/Config file)? Do I have to convert the monocular video into a .mp4 format?
What does "hold_out" mean? (hold_out vs. no_hold_out)
Thank you very much!