ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion

Project Page | Paper

Official PyTorch implementation for the ECCV 2024 paper: "ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion".

👀 Table of Contents

Installation
Data Preparation
- KITTI
- Cityscapes
Pretrianed weights
Training
- Single-GPU
- Multi-GPU
Ground Truth Data Preparation and Evaluation
Citation

⚙️ Installation

You can install the dependencies with:

git clone https://github.com/Sungmin-Woo/ProDepth.git
cd ProDepth/
conda create -n prodepth python=3.9.13
conda activate prodepth
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch
pip install numpy==1.23.4 matplotlib==3.5.3 opencv-python==4.7.0.72 tqdm scikit-image timm==0.9.7 tensorboardX==1.4

We ran out experiments with PyTorch 1.7.1, CUDA 11.0, Pyhton 3.9.13 and Ubuntu 18.04.

💾 Data Preparation

🔹 KITTI

You can download the entire raw KITTI dataset by running:

wget -i splits/kitti_archives_to_download.txt -P kitti_data/

Then unzip with

cd kitti_data
unzip "*.zip"
cd ..

You should be able to obtain following directory structure.

data_dir/kitti_data/
  |-- 2011_09_26
    |-- calib_cam_to_cam.txt
    |-- calib_imu_to_velo.txt
    |-- calib_velo_to_cam.txt
    |-- 2011_09_26_drive_0001_sync
      |-- image_00
      |-- image_01
      |-- image_02
      |-- image_03
      |-- oxts
      |-- velodyne_points
    |-- 2011_09_26_drive_0002_sync
    |-- ...
  |-- 2011_09_28
  |-- 2011_09_29
  |-- 2011_09_30
  |-- 2011_10_03

You can also place the KITTI dataset wherever you like and point towards it with the --data_path flag during training and evaluation.

Please refer to Monodepth2 for detail instructions.

🔹 Cityscapes

From Cityscapes official website download the following packages: 1) leftImg8bit_sequence_trainvaltest.zip, 2) camera_trainvaltest.zip into the CS_RAW folder.

Preprocess the Cityscapes dataset using the prepare_train_data.py(from SfMLearner) script with following command:

cd CS_RAW
unzip leftImg8bit_sequence_trainvaltest.zip
unzip camera_trainvaltest.zip
cd ..

python prepare_train_data.py \
    --img_height 512 \
    --img_width 1024 \
    --dataset_dir CS_RAW \
    --dataset_name cityscapes \
    --dump_root CS \
    --seq_length 3 \
    --num_threads 8

You should be able to obtain following directory structure ./CS_RAW and ./CS as

data_dir/CS_RAW/
  |--camera
    |--test
    |--train
    |--val
  |--leftImg8bit_sequence
    |--test
    |--train
    |--val
  |--license.txt
  |--ReadMe

data_dir/CS/
  |--aachen
  |--bochum

You can also place the Cityscapes dataset wherever you like and point towards it with the --data_path flag during training and evaluation.

📦 Pretrained Weights

You can download weights for some pretrained models here:

🔹 KITTI	Model	Input size	Cityscapes AbsRel	Link
ProDepth	640 x 192	0.086	Download 🔗

🔹 Cityscapes	Model	Input size	Cityscapes AbsRel	Link
ProDepth	512 x 192	0.095	Download 🔗

⏳ Training

Training can be done with a single GPU or multiple GPUs (via torch.nn.parallel.DistributedDataParallel).
By default, models and log event files are saved to ./log.
As our model uses the lightweight backbone of "Lite-Mono", please download the ImageNet-1K pretrained Lite-Mono and place to './pretrained/'.
To stable the training process of multi-frame depth estimation, we recommend freezing the single-frame depth network during training. Here, we provide checkpoints for single-frame depth estimation for both KITTI and Cityscapes. Please download the given checkpoints and place to './pretrained/CS or KITTI/'.

🔹 Single GPU Training

To train w/ single GPU on Cityscapes Dataset:

Change $GPU_NUM and $BS in train_cs_prodepth.sh to 1 and 24, and run:

bash ./train_cs_prodepth.sh <model_name> <port_num>

🔹 Multi-GPU Training

For instance, to train w/ 4 GPUs on Cityscapes Dataset: Change $GPU_NUM and $BS in train_cs_prodepth.sh to 4 and 6, and run:

CUDA_VISIBLE_DEVICES=<your_desired_GPU> bash ./train_cs_prodepth.sh <model_name> <port_num>

📊 Ground Truth Data Preparation and Evaluation

🔹 KITTI

To prepare the ground truth depth maps, run:

python export_gt_depth.py --data_path kitti_data --split eigen
python export_gt_depth.py --data_path kitti_data --split eigen_benchmark

Assuming that you have placed the KITTI dataset in the default location of ./kitti_data/.

To evaluate a model on KITTI, run:

bash ./test_kitti_prodepth.sh <model_name>

🔹 Cityscapes

Download cityscapes depth ground truth (provided by manydepth) for evaluation:

cd splits/cityscapes/
wget https://storage.googleapis.com/niantic-lon-static/research/manydepth/gt_depths_cityscapes.zip
unzip gt_depths_cityscapes.zip

To evaluate a model on Cityscapes, run:

bash ./test_cs_prodepth.sh <model_name>

Acknowledgements

Our work is partially based on these opening source work: monodepth2, ManyDepth, DynamicDepth, DynamoDepth, Lite-Mono.

We appreciate their contributions to the depth learning community.

✏️ 📄 Citation

If you find our work useful or interesting, please cite our paper:

@article{woo2024prodepth,
  title={ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion},
  author={Woo, Sungmin and Lee, Wonjoon and Kim, Woo Jin and Lee, Dogyoon and Lee, Sangyoun},
  journal={arXiv preprint arXiv:2407.09303},
  year={2024}
}

Sungmin-Woo / ProDepth

readme

ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion

Project Page | Paper

👀 Table of Contents

⚙️ Installation

💾 Data Preparation

🔹 KITTI

🔹 Cityscapes

📦 Pretrained Weights

⏳ Training

🔹 Single GPU Training

🔹 Multi-GPU Training

📊 Ground Truth Data Preparation and Evaluation

Acknowledgements

✏️ 📄 Citation