Official implementation of "SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning"(T-IV 2024).
[[paper]()] [[arxiv]()] [website]
An illustration of our proposed SCIPaD framework. Compared with the traditional PoseNet architecture, it comprises three main parts: (1) a confidence-aware feature flow estimator, (2) a spatial clue aggregator, and (3) a hierarchical positional embedding injector.
Create a conda environment:
conda create -n scipad python==3.9
conda activate scipad
Install pytorch, torchvision and cuda:
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
Install the dependencies:
pip install -r requirements.txt
Note that we ran our experiments with PyTorch 2.0.1, CUDA 11.7, Python 3.9 and Ubuntu 20.04. It's acceptable to use a newer version of PyTorch, but the metrics might be slightly different from those reported in the paper (~0.01%).
You can download weights for pretrained models from the Google Drive or Baidu Cloud Drive:
Methods | WxH | abs rel | sq rel | RMSE | RMSE log | $\delta < 1.25$ | $\delta < 1.25^2$ | $\delta < 1.25^3$ |
---|---|---|---|---|---|---|---|---|
KITTI Raw | 640x192 | 0.090 | 0.650 | 4.056 | 0.166 | 0.918 | 0.970 | 0.985 |
Methods | WxH | Seq09 $e_t$ | Seq09 $e_r$ | Seq09 ATE | Seq10 $e_t$ | Seq10 $e_r$ | Seq10 ATE |
---|---|---|---|---|---|---|---|
KITTI Odom | 640x192 | 7.43 | 2.46 | 26.15 | 9.82 | 3.87 | 15.51 |
Create a ./checkpoints/
folder and place the pretrained models inside it.
wget -i splits/kitti_archives_to_download.txt -P kitti_data/
cd kitti_data
unzip "*.zip"
cd ..
find kitti_data/ -name '*.png' | parallel 'convert -quality 92 -sampling-factor 2x2,1x1,1x1 {.}.png {.}.jpg && rm {}'
We also need pre-computed segmentation images provided by TriDepth for training (not needed for evaluation). Download them from here and organize the dataset as follows:
kitti_raw
├── 2011_09_26
│ ├── 2011_09_26_drive_0001_sync
│ ├── ...
│ ├── calib_cam_to_cam.txt
│ ├── calib_imu_to_velo.txt
│ └── calib_velo_to_cam.txt
├── ...
├── 2011_10_03
│ ├── ...
└── segmentation
├── 2011_09_26
├── ...
└── 2011_10_03
kitti_odom
├── poses
│ ├── 00.txt
│ ├── ...
│ └── 10.txt
└── sequences
├── 00
├── ...
└── 21
On KITTI Raw:
python train.py --config configs/kitti_raw.yaml
On KITTI Odometry:
python train.py --config configs/kitti_odom.yaml
The yacs configuration library is used in this project.
You can customize your configuration structure in ./utils/config/defaults.py
and configuration value in ./configs/*.yaml
.
First, download KITTI ground truth and improved ground truth
from here, and put them into
the ./split/eigen
and ./split/eigen_benchmark
, respectively. You can also obtain them following instructions
provided in Monodepth2.
python evaluate_depth.py --config configs/kitti_raw.yaml load_weights_folder checkpoints/KITTI eval.batch_size 1
python evaluate_depth.py --config configs/kitti_raw.yaml \
load_weights_folder checkpoints/KITTI \
eval.batch_size 1 \
eval.split eigen_benchmark
python evaluate_pose.py --config configs/kitti_odom.yaml \
load_weights_folder checkpoints/KITTI_Odom \
eval.split odom_09
python evaluate_pose.py --config configs/kitti_odom.yaml \
load_weights_folder checkpoints/KITTI_Odom/ \
eval.split odom_10
python ./utils/kitti_odom_eval/eval_odom.py --result=checkpoints/KITTI_Odom/ --align='7dof'
You can refer to ./eval.sh
for more information.
If you find our work useful in your research please consider citing our paper:
@inproceedings{feng2024scipad,
title={SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning},
author={Feng, Yi and Guo, Zizhan and Chen, Qijun and Fan, Rui},
journal={IEEE Transactions on Intelligent Vehicles},
year={2024},
publisher={IEEE}
}