Numair Khan1,
Eric Penner1,
Douglas Lanman1,
Lei Xiao1
1Reality Labs Research
CVPR 2023
The code has been tested in the following setup
We recommend running the code in a virtual environment such as Conda. After cloning the repo, run the following commands from the base directory:
$ conda env create -f environment.yml
$ conda activate tcod
Then, run the following script to download model checkpoints and set up the depth estimation backbones. We use DPT and RAFT-Stereo as the monocular and stereo depth estimation backbones respectively.
$ sh ./initialize.sh
A small subset of ScanNet data is included in the test_data
folder for demoing and testing purposes. To execute the code on it, run
$ python ./run.py --demo
By default, this script will generate a visual comparison of the results (RGB | Input Depth | Our Result) in a folder called output
. You can use the --save_numpy
flag to save the actual floating point depth values to separate files. A different output location can be specified by setting the --outdir
argument.
To run the method with monocular depth estimation on a ScanNet scene use:
$ python ./run.py --scannet --indir=/ScanNet/scans/sceneXXXX_XX --outdir=/PATH_TO_OUTPUT_DIR
We assume the data is extracted into the standard ScanNet directories (If this is not the case modify the paths used in datasets/scannet.py
).
To run the method with monocular depth estimation on COLMAP data use:
$ python ./run.py --colmap --indir=/PATH_TO_COLMAP_DIR/ --outdir=/PATH_TO_OUTPUT_DIR
Again, we assume all data exists in standard COLMAP directory format. To ensure this, we recommend using the provided COLMAP execution script:
$ sh ./utils/colmap.sh PATH_TO_VIDEO_FRAMES PATH_TO_COLMAP_DIR
Depending on the image sequence, the parameters used at different stages of COLMAP may need to be adjusted to generate a good reconstruction.
To run the method with stereo depth estimation on the MPI-Sintel dataset use:
$ python ./run.py --mpisintel --scene SCENE_NAME --indir=/PATH_TO_MPI_SINTEL_BASE_DIR/ --outdir=/PATH_TO_OUTPUT_DIR
Where SCENE_NAME
is one of the 23 scenes in the dataset derived from the respective folder names (alley_1
, alley_2
, etc). We assume the user has downloaded the depth/camera motion and the stereo/disparity training data from the dataset and extracted them into a single folder at PATH_TO_MPI_SINTEL_BASE_DIR
.
To test the method with custom data (stereo or monocular) you will need to implement a data loader based on the template provided in datasets/custom.py
and add it in run.py
. Please refer to the loaders provided for the above-mentioned datasets as examples. In brief, the data loader is expected to return an unprocessed depth map, the RGB image, and the camera pose and intrinsics.
Important points to note are
Some common causes of large errors in the output can be:
depth = f * B / disparity
, where f
is the horizontal focal length and B
is the stereo baseline.Will be added as required.
If you find our work useful for your research, please cite the following paper:
@article{khan2023tcod,
title={Temporally Consistent Online Depth Estimation Using Point-Based Fusion},
author={Numair Khan, Eric Penner, Douglas Lanman, Lei Xiao},
journal={Computer Vision and Pattern Recognition (CVPR)},
year={2023},
}
Our source code is CC-BY-NC licensed, as found in the LICENSE file.