Wanggcong / SparseNeRF

[ICCV 2023] SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis
SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis

Guangcong WangZhaoxi ChenChen Change LoyZiwei Liu
S-Lab, Nanyang Technological University
ICCV 2023
### :baby_chick: [Project]( | [YouTube]( | [arXiv](

:baby_chick: Update:

:baby_chick: Features:

  • :white_check_mark: Applied for General scenes. Depth maps from pre-trained monocular depth estimation or depth sensors, which are coarse and easy to obtain.
  • :white_check_mark: Only 1 GPU for training and test. Training a scene requires about 2 hours.
  • :white_check_mark: Combine SparseNeRF with other methods: FreeNeRF w/ SparseNeRF, which achieves better results. It shows that our SparseNeRF might be integrated into other methods.
  • :white_check_mark: FAQ: A frequently asked questions (FAQ) list.
  • :white_check_mark: Use your dataset: A tutorial on how to use your own dataset.
  • :white_check_mark: Tutorial: Detailed explanation of SparseNeRF, slide, figure+pseudo algorithm table: A tutorial on how to implement SparseNeRF is released. If you cannot open the link, you can download it in the tutorial folder.
  • :white_check_mark: A poster for the overview. Also see Project | YouTube | arXiv.

:baby_chick: TL;DR: We present SparseNeRF, a simple yet effective method that synthesizes novel views given a few images. SparseNeRF distills robust local depth ranking priors from real-world inaccurate depth observations, such as pre-trained monocular depth estimation models or consumer-level depth sensors.

:baby_chick: Abstract: Neural Radiance Field (NeRF) significantly degrades when only a limited number of views are available. To complement the lack of 3D information, depth-based models, such as DSNeRF and MonoSDF, explicitly assume the availability of accurate depth maps of multiple views. They linearly scale the accurate depth maps as supervision to guide the predicted depth of few-shot NeRFs. However, accurate depth maps are difficult and expensive to capture due to wide-range depth distances in the wild.

In this work, we present a new Sparse-view NeRF (SparseNeRF) framework that exploits depth priors from real-world inaccurate observations. The inaccurate depth observations are either from pre-trained depth models or coarse depth maps of consumer-level depth sensors. Since coarse depth maps are not strictly scaled to the ground-truth depth maps, we propose a simple yet effective constraint, a local depth ranking method, on NeRFs such that the expected depth ranking of the NeRF is consistent with that of the coarse depth maps in local patches. To preserve the spatial continuity of the estimated depth of NeRF, we further propose a spatial continuity constraint to encourage the consistency of the expected depth continuity of NeRF with coarse depth maps. Surprisingly, with simple depth ranking constraints, SparseNeRF outperforms all state-of-the-art few-shot NeRF methods (including depth-based models) on standard LLFF and DTU datasets. Moreover, we collect a new dataset NVS-RGBD that contains real-world depth maps from Azure Kinect, ZED 2, and iPhone 13 Pro. Extensive experiments on NVS-RGBD dataset also validate the superiority and generalizability of SparseNeRF.

:baby_chick: Framework Overview: SparseNeRF consists of two streams, i.e., NeRF and depth prior distillation. As for NeRF, we use Mip-NeRF as the backbone. we use a NeRF reconstruction loss. As for depth prior distillation, we distill depth priors from a pre-trained depth model. Specifically, we propose a local depth ranking regularization and a spatial continuity regularization to distill robust depth priors from coarse depth maps.

1. Prerequisites

2. Installation

We recommend using the virtual environment (conda) to run the code easily.

conda create -n sparsenerf python=3.6.13
conda activate sparsenerf
pip install -r requirements.txt

Download jax+cuda (jaxlib-0.1.68+cuda101-cp36) wheels from this link by

pip install jaxlib-0.1.68+cuda101-cp36-none-manylinux2010_x86_64.whl
rm jaxlib-0.1.68+cuda101-cp36-none-manylinux2010_x86_64.whl

Install pytorch and related packages for pretrained depth models

conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.1 -c pytorch
pip install timm
pip install opencv-python

Install ffmpeg for composing videos

pip install imageio-ffmpeg

3. Dataset

3.1 Download DTU dataset

3.2 Download LLFF dataset

3.3 Download NVS-RGBD dataset

4. Training

4.1 Training on LLFF

Please set the variables in scripts/ and configs/llff3.gin, and run:

sh scripts/

4.2 Training on DTU

Please set the variables in, and run:

sh scripts/

4.3 Training on NVS-RGBD

Similar to 4.1 and 4.2. The depth maps are from depth sensors.

sh scripts/
sh scripts/
sh scripts/

5. Test

5.1 Evaluation on LLFF

Please set the variables (the same as and in or eval_dtu3, and run:

sh scripts/

5.2 Evaluation on DTU

sh scripts/

5.3 Evaluation on NVS-RGBD

sh scripts/
sh scripts/
sh scripts/

6 (Optional) Render videos

Please set the variables (the same as and in or, and run.

6.1 Render videos on LLFF

sh scripts/

6.2 Render videos on DTU

sh scripts/

6.3 Render videos on NVS-RGBD

sh scripts/
sh scripts/
sh scripts/

7 (Optional) Compose videos

Please set the variables in or other scripts, and run.

7.1 Compose videos on LLFF


7.2 Compose videos on DTU


7.3 Compose videos on NVS-RGBD


8 (Optional) Tensorboard for visualizing training if necessary.

tensorboard --logdir=./out/xxx/ --port=6006

If it raises errors, see Q2 of FQA

9. Citation

If you find this useful for your research, please cite the our paper.

   author    = {Wang, Guangcong and Chen, Zhaoxi and Loy, Chen Change and Liu, Ziwei},
   title     = {SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis},
   booktitle = {IEEE/CVF International Conference
on Computer Vision (ICCV)},   
   year      = {2023},


