fengyi233 / SCIPaD

Official implementation of the paper "SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning"
MIT License
13 stars 1 forks source link

SCIPaD

PWC

Official implementation of "SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning"(T-IV 2024).

[[paper]()] [[arxiv]()] [website]

framework architecture

An illustration of our proposed SCIPaD framework. Compared with the traditional PoseNet architecture, it comprises three main parts: (1) a confidence-aware feature flow estimator, (2) a spatial clue aggregator, and (3) a hierarchical positional embedding injector.

⚙️ Setup

Create a conda environment:

conda create -n scipad python==3.9
conda activate scipad

Install pytorch, torchvision and cuda:

conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia

Install the dependencies:

pip install -r requirements.txt

Note that we ran our experiments with PyTorch 2.0.1, CUDA 11.7, Python 3.9 and Ubuntu 20.04. It's acceptable to use a newer version of PyTorch, but the metrics might be slightly different from those reported in the paper (~0.01%).

🖼️ Pretrained Models

You can download weights for pretrained models from the Google Drive or Baidu Cloud Drive:

Methods WxH abs rel sq rel RMSE RMSE log $\delta < 1.25$ $\delta < 1.25^2$ $\delta < 1.25^3$
KITTI Raw 640x192 0.090 0.650 4.056 0.166 0.918 0.970 0.985
Methods WxH Seq09 $e_t$ Seq09 $e_r$ Seq09 ATE Seq10 $e_t$ Seq10 $e_r$ Seq10 ATE
KITTI Odom 640x192 7.43 2.46 26.15 9.82 3.87 15.51

Create a ./checkpoints/ folder and place the pretrained models inside it.

💾 Dataset Preparation

wget -i splits/kitti_archives_to_download.txt -P kitti_data/
cd kitti_data
unzip "*.zip"
cd ..
find kitti_data/ -name '*.png' | parallel 'convert -quality 92 -sampling-factor 2x2,1x1,1x1 {.}.png {.}.jpg && rm {}'

We also need pre-computed segmentation images provided by TriDepth for training (not needed for evaluation). Download them from here and organize the dataset as follows:

kitti_raw
├── 2011_09_26
│   ├── 2011_09_26_drive_0001_sync
│   ├── ...
│   ├── calib_cam_to_cam.txt
│   ├── calib_imu_to_velo.txt
│   └── calib_velo_to_cam.txt
├── ...
├── 2011_10_03
│   ├── ...
└── segmentation
    ├── 2011_09_26
    ├── ...
    └── 2011_10_03
kitti_odom
├── poses
│   ├── 00.txt
│   ├── ...
│   └── 10.txt
└── sequences
    ├── 00
    ├── ...
    └── 21

⏳ Training

On KITTI Raw:

python train.py --config configs/kitti_raw.yaml

On KITTI Odometry:

python train.py --config configs/kitti_odom.yaml

🔧 Other training options

The yacs configuration library is used in this project. You can customize your configuration structure in ./utils/config/defaults.py and configuration value in ./configs/*.yaml.

📊 KITTI evaluation

First, download KITTI ground truth and improved ground truth from here, and put them into the ./split/eigen and ./split/eigen_benchmark, respectively. You can also obtain them following instructions provided in Monodepth2.

python evaluate_depth.py --config configs/kitti_raw.yaml  load_weights_folder checkpoints/KITTI eval.batch_size 1
python evaluate_depth.py --config configs/kitti_raw.yaml \
  load_weights_folder checkpoints/KITTI \
  eval.batch_size 1 \
  eval.split eigen_benchmark
python evaluate_pose.py --config configs/kitti_odom.yaml \
  load_weights_folder checkpoints/KITTI_Odom \
  eval.split odom_09
python evaluate_pose.py --config configs/kitti_odom.yaml \
  load_weights_folder checkpoints/KITTI_Odom/ \
  eval.split odom_10
python ./utils/kitti_odom_eval/eval_odom.py --result=checkpoints/KITTI_Odom/ --align='7dof'

You can refer to ./eval.sh for more information.

Citation

If you find our work useful in your research please consider citing our paper:

@inproceedings{feng2024scipad,
  title={SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning},
  author={Feng, Yi and Guo, Zizhan and Chen, Qijun and Fan, Rui},
  journal={IEEE Transactions on Intelligent Vehicles},
  year={2024},
  publisher={IEEE}
}