Geometry-aware Reconstruction and Fusion-refined Rendering for Generalizable Neural Radiance Fields

PyTorch implementation of paper "Geometry-aware Reconstruction and Fusion-refined Rendering for Generalizable Neural Radiance Fields", CVPR 2024.

Geometry-aware Reconstruction and Fusion-refined Rendering for Generalizable Neural Radiance Fields
Tianqi Liu, Xinyi Ye, Min Shi, Zihao Huang, Zhiyu Pan, Zhan Peng, Zhiguo Cao* \ CVPR 2024
project page | paper | poster | model

Introduction

Generalizable NeRF aims to synthesize novel views for unseen scenes. Common practices involve constructing variance-based cost volumes for geometry reconstruction and encoding 3D descriptors for decoding novel views. However, existing methods show limited generalization ability in challenging conditions due to inaccurate geometry, sub-optimal descriptors, and decoding strategies. We address these issues point by point. First, we find the variance-based cost volume exhibits failure patterns as the features of pixels corresponding to the same point can be inconsistent across different views due to occlusions or reflections. We introduce an Adaptive Cost Aggregation (ACA) approach to amplify the contribution of consistent pixel pairs and suppress inconsistent ones. Unlike previous methods that solely fuse 2D features into descriptors, our approach introduces a Spatial-View Aggregator (SVA) to incorporate 3D context into descriptors through spatial and inter-view interaction. When decoding the descriptors, we observe the two existing decoding strategies excel in different areas, which are complementary. A Consistency-Aware Fusion (CAF) strategy is proposed to leverage the advantages of both. We incorporate the above ACA, SVA, and CAF into a coarse-to-fine framework, termed Geometry-aware Reconstruction and Fusion-refined Rendering (GeFu). GeFu attains state-of-the-art performance across multiple datasets.

Installation

Clone this repository:

git clone https://github.com/TQTQliu/GeFu.git
cd GeFu

Set up the python environment

conda create -n gefu python=3.8
conda activate gefu
pip install -r requirements.txt
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

Datasets

1. DTU

Training data. Download DTU training data and Depth raw. Unzip and organize them as:

mvs_training
    ├── dtu                   
        ├── Cameras                
        ├── Depths   
        ├── Depths_raw
        └── Rectified

2. NeRF Synthetic (Blender) and Real Forward-facing (LLFF)

Download the NeRF Synthetic and Real Forward-facing datasets and unzip them.

Usage

Train generalizable model

To train a generalizable model from scratch on DTU, specify data_root in configs/gefu/dtu_pretrain.yaml first and then run:

python train_net.py --cfg_file configs/gefu/dtu_pretrain.yaml

Our code also supports multi-gpu training. The released pretrained model was trained with 4 GPUs.

python -m torch.distributed.launch --nproc_per_node=4 train_net.py --cfg_file configs/gefu/dtu_pretrain.yaml distributed True gpus 0,1,2,3

Per-scene optimization

Here we take the scan1 on the DTU as an example:

cd ./trained_model/gefu
mkdir dtu_ft_scan1
cp dtu_pretrain/latest.pth dtu_ft_scan1
cd ../..
python train_net.py --cfg_file configs/gefu/dtu/scan1.yaml

We provide the finetuned models for each scenes here.

Evaluation

Evaluate the pretrained model on DTU

Download the pretrained model and put it into trained_model/gefu/dtu_pretrain/latest.pth

Use the following command to evaluate the pretrained model on DTU:

python run.py --type evaluate --cfg_file configs/gefu/dtu_pretrain.yaml gefu.eval_depth True

The rendering images will be saved in result/gefu/dtu_pretrain. Add the save_video True parameter at the end of the command to save the rendering videos.

Evaluate the pretrained model on Real Forward-facing

python run.py --type evaluate --cfg_file configs/gefu/llff_eval.yaml

Evaluate the pretrained model on NeRF Synthetic

python run.py --type evaluate --cfg_file configs/gefu/nerf_eval.yaml

Citation

If you find our work useful for your research, please cite our paper.

@InProceedings{Liu_2024_CVPR,
    author    = {Liu, Tianqi and Ye, Xinyi and Shi, Min and Huang, Zihao and Pan, Zhiyu and Peng, Zhan and Cao, Zhiguo},
    title     = {Geometry-aware Reconstruction and Fusion-refined Rendering for Generalizable Neural Radiance Fields},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {7654-7663}
}

Relevant Works

PixelNeRF: Neural Radiance Fields from One or Few Images, CVPR 2021
IBRNet: Learning Multi-View Image-Based Rendering, CVPR 2021
MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo, ICCV 2021
Neural Rays for Occlusion-aware Image-based Rendering, CVPR 2022
ENeRF: Efficient Neural Radiance Fields for Interactive Free-viewpoint Video, SIGGRAPH Asia 2022
Is Attention All NeRF Needs?, ICLR 2023
Explicit Correspondence Matching for Generalizable Neural Radiance Fields, arXiv 2023

Acknowledgement

The project is mainly based on ENeRF. Many thanks for their excellent contributions! When using our code, please also pay attention to the license of ENeRF.

TQTQliu / GeFu

readme