Neural RGB-D Surface Reconstruction
Dejan Azinović, Ricardo Martin-Brualla, Dan B Goldman, Matthias Nießner, Justus Thies
CVPR 2022
This repository contains the code for the paper Neural RGB-D Surface Reconstruction, a novel approach for 3D reconstruction that combines implicit surface representations with neural radiance fields.
You can create a conda environment called neural_rgbd using:
conda env create -f environment.yaml
conda activate neural_rgbd
Make sure to clone the external Marching Cubes dependency and install it in the same environment:
cd external/NumpyMarchingCubes
python setup.py install
You can run an optimization using:
python optimize.py --config configs/<config_file>.txt
The data needs to be in the following format:
<scene_name> # args.datadir in the config file
├── depth # raw (real data) or ground truth (synthetic data) depth images (optional)
├── depth0.png
├── depth1.png
├── depth2.png
...
├── depth_filtered # filtered depth images
├── depth0.png
├── depth1.png
├── depth2.png
...
├── depth_with_noise # depth images with synthetic noise and artifacts (optional)
├── depth0.png
├── depth1.png
├── depth2.png
...
├── images # RGB images
├── img0.png
├── img1.png
├── img2.png
...
├── focal.txt # focal length
├── poses.txt # ground truth poses (optional)
├── trainval_poses.txt # camera poses used for optimization
The dataloader is hard-coded to load depth maps from the depth_filtered
folder. These depth maps have been generated from the raw ones (or depth_with_noise
in the case of synthetic data) using the same bilateral filter that was used by BundleFusion. The method also works with the raw depth maps, but the results are slightly degraded.
The file focal.txt
contains a single floating point value representing the focal length of the camera in pixels.
The files poses.txt
and trainval_poses.txt
contain the camera matrices in the format 4N x 4, where is the number of cameras in the trajectory. Like the NeRF paper, we use the OpenGL convention for the camera's coordinate system. If you run this code on ScanNet data, make sure to transform the poses to the OpenGL system, since ScanNet used a different convention.
You can also write your own dataloader. You can use the existing load_scannet.py
as template and update load_dataset.py
.
The dataset used in the paper is available via the following link: neural_rgbd_data.zip (7.25 GB). The ICL data is not included here, but can be downloaded from the original author's webpage.
The scene files have been provided by various artists for free on BlendSwap. Please refer to the table below for license information and links to the .blend files.
License | Scene name |
---|---|
CC-BY | Breakfast room |
CC-0 | Complete kitchen |
CC-BY | Green room |
CC-BY | Grey-white room |
CC-BY | Kitchen |
CC-0 | Morning apartment |
CC-BY | Staircase |
CC-BY | Thin geometry |
CC-BY | Whiteroom |
We also provide culled ground truth meshes and our method's meshes for evaluation purposes: meshes.zip (514 MB).
If you use this code in your research, please consider citing:
@InProceedings{Azinovic_2022_CVPR,
author = {Azinovi\'c, Dejan and Martin-Brualla, Ricardo and Goldman, Dan B and Nie{\ss}ner, Matthias and Thies, Justus},
title = {Neural RGB-D Surface Reconstruction},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {6290-6301}
}
The code is largely based on the original NeRF code by Mildenhall et al. https://github.com/bmild/nerf
The Marching Cubes implementation was adapted from the SPSG code by Dai et al. https://github.com/angeladai/spsg