PyTorch implementation of:
EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision,
Jiawei Yang, Boris Ivanovic, Or Litany, Xinshuo Weng, Seung Wook Kim, Boyi Li, [Tong Che](), Danfei Xu, Sanja Fidler, Marco Pavone, Yue Wang
We introduce EmerNeRF, a self-supervised approach that utilizes neural fields for spatial-temporal decomposition, motion estimation, and the lifting of foundation features. EmerNeRF can decompose a scene into dynamic objects and a static background and estimate their motion in a self-supervised way. Enriched with lifted and "denoised" 2D features in 4D space-time, EmerNeRF unveils new potentials for scene understanding. Additionally, we release the NeRF On-The-Road (NOTR) dataset split to support future research.
Our code is developed on Ubuntu 22.04 using Python 3.9 and PyTorch 2.0. Please note that the code has only been tested with these specified versions. We recommend using conda for the installation of dependencies. The installation process might take more than 30 minutes.
emernerf
conda environment and install all dependencies:conda create -n emernerf python=3.9 -y
conda activate emernerf
# this will take a while: more than 10 minutes
pip install -r requirements.txt
nerfacc
and tiny-cuda-nn
manually:pip install git+https://github.com/nerfstudio-project/nerfacc.git@8340e19daad4bafe24125150a8c56161838086fa
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
Troubleshooting:
NOTR (Waymo) Dataset: See NOTR Dataset Preparation for detailed instructions on preparing the NOTR dataset.
NuScenes Dataset: For preparing the NuScenes dataset and viewing example sample results, refer to NuScenes Dataset Preparation.
For those interested in setting up a custom dataset, kindly use these two datasets as templates. Also take a look at the datasets/base directory
to familiarize yourself with the dataset preparation process in our codebase.
We have provided detailed comments in configs/default_config.yaml
for each configuration. Alongside the released code, you'll also find these detailed comments.
Sample training scripts can be found in sample_scripts/
# Adjust hyper-parameters as needed
python train_emernerf.py \
--config_file configs/default_config.yaml \
--output_root $output_root \
--project $project \
--run_name ${scene_idx} \
--render_data_video_only \ # This will render a video of the data.
data.scene_idx=$scene_idx \
data.pixel_source.load_size=[160,240] \ # Downsample to enhance the visibility of LiDAR points.
data.pixel_source.num_cams=3 \ # Opt for 1, 3, or 5
data.start_timestep=0 \
data.end_timestep=-1
This script produces a video similar to the one below, showcasing LiDAR points colored by their range values and the 3D scene flows, and their feature maps (if load_features=True
):
python train_emernerf.py \
--config_file configs/default_flow.yaml \
--output_root $output_root \
--project $project \
--run_name ${scene_idx}_flow \
data.scene_idx=$scene_idx \
data.start_timestep=$start_timestep \
data.end_timestep=$end_timestep \
data.pixel_source.load_features=True \
data.pixel_source.feature_model_type=dinov2_vitb14 \
nerf.model.head.enable_feature_head=True \
nerf.model.head.enable_learnable_pe=True \
logging.saveckpt_freq=$num_iters \
optim.num_iters=$num_iters
For more examples, refer to the sample_scripts/
folder.
--visualize_voxel
and specify resume_from=$YOUR_PRETRAINED_MODEL
. This will produce an HTML file which you can open in a browser for voxel feature visualization:
Consider citing our paper if you find this code or our paper is useful for your research:
@article{yang2023emernerf,
title={EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision},
author={Jiawei Yang and Boris Ivanovic and Or Litany and Xinshuo Weng and Seung Wook Kim and Boyi Li and Tong Che and Danfei Xu and Sanja Fidler and Marco Pavone and Yue Wang},
journal={arXiv preprint arXiv:2311.02077},
year={2023}
}