IVRL / VolRecon

Official code of VolRecon (CVPR 2023)
MIT License
136 stars 15 forks source link
3d-reconstruction 3dvision implicit-neural-representation multi-view-stereo volume-rendering

VolRecon

Code of paper 'VolRecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction' (CVPR 2023)

Project | arXiv

DOI

teaser

Abstract: The success of the Neural Radiance Fields (NeRF) in novel view synthesis has inspired researchers to propose neural implicit scene reconstruction. However, most existing neural implicit reconstruction methods optimize per-scene parameters and therefore lack generalizability to new scenes. We introduce VolRecon, a novel generalizable implicit reconstruction method with Signed Ray Distance Function (SRDF). To reconstruct the scene with fine details and little noise, VolRecon combines projection features aggregated from multi-view features, and volume features interpolated from a coarse global feature volume. Using a ray transformer, we compute SRDF values of sampled points on a ray and then render color and depth. On DTU dataset, VolRecon outperforms SparseNeuS by about 30% in sparse view reconstruction and achieves comparable accuracy as MVSNet in full view reconstruction. Furthermore, our approach exhibits good generalization performance on the large-scale ETH3D benchmark.

If you find this project useful for your research, please cite:

@misc{ren2022volrecon,
      title={VolRecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction}, 
      author={Yufan Ren and Fangjinhua Wang and Tong Zhang and Marc Pollefeys and Sabine Süsstrunk},
      journal={CVPR},
      year={2023}
}

Installation

Requirements

conda create --name volrecon python=3.8 pip
conda activate volrecon

pip install -r requirements.txt

Reproducing Sparse View Reconstruction on DTU

Camera file cam.txt stores the camera parameters, which includes extrinsic, intrinsic, minimum depth and depth range interval:

extrinsic
E00 E01 E02 E03
E10 E11 E12 E13
E20 E21 E22 E23
E30 E31 E32 E33

intrinsic
K00 K01 K02
K10 K11 K12
K20 K21 K22

DEPTH_MIN DEPTH_INTERVAL

pair.txt stores the view selection result. For each reference image, 10 best source views are stored in the file:

TOTAL_IMAGE_NUM
IMAGE_ID0                       # index of reference image 0 
10 ID0 SCORE0 ID1 SCORE1 ...    # 10 best source images for reference image 0 
IMAGE_ID1                       # index of reference image 1
10 ID0 SCORE0 ID1 SCORE1 ...    # 10 best source images for reference image 1 
...

Evaluation on Custom Dataset

We provide some helpful scripts for evaluation on custom datasets, which consists of a set of images. As discussed in the limitation section, our method is not suitable for very large-scale scenes because of the coarse global feature volume. The main steps are as follows:

intrinsic K00 K01 K02 K10 K11 K12 K20 K21 K22

DEPTH_MIN DEPTH_MAX

* The file ``code/dataset/general_fit.py`` is the dataset loader. The parameter ``self.offset_dist`` is the distance offset w.r.t. the reference view to generate a virtual viewpoint for rendering, which can be adjusted (set to 25mm by default). 
* Use ``script/eval_general.sh`` for image and depth rendering. 

## Training on DTU

* Download pre-processed [DTU's training set](https://drive.google.com/file/d/1eDjh-_bxKKnEuz5h-HXS7EDJn59clx6V/view) and [Depths_raw](https://virutalbuy-public.oss-cn-hangzhou.aliyuncs.com/share/cascade-stereo/CasMVSNet/dtu_data/dtu_train_hr/Depths_raw.zip) (both provided by MVSNet). Then organize the dataset as follows:

root_directory ├──Cameras ├──Rectified └──Depths_raw


* In ``train_dtu.sh``, set `DATASET` as the root directory of dataset; set `LOG_DIR` as the directory to store the checkpoints. 

* Train the model by running `bash train_dtu.sh` on GPU.

## Acknowledgement

Part of the code is based on [SparseNeuS](https://github.com/xxlong0/SparseNeuS) and [IBRNet](https://github.com/googleinterns/IBRNet).