MoCha-Stereo 抹茶算法

[CVPR2024] The official implementation of "MoCha-Stereo: Motif Channel Attention Network for Stereo Matching".

https://github.com/ZYangChen/MoCha-Stereo/assets/108012397/2ed414fe-d182-499b-895c-b5375ef51425

V1 Version

MoCha-Stereo: Motif Channel Attention Network for Stereo Matching
Ziyang Chen†, Wei Long†, He Yao†, Yongjun Zhang✱,Bingshu Wang, Yongbin Qin, Jia Wu
CVPR 2024
Correspondence: ziyangchen2000@gmail.com; zyj6667@126.com✱

@inproceedings{chen2024mocha,
  title={MoCha-Stereo: Motif Channel Attention Network for Stereo Matching},
  author={Chen, Ziyang and Long, Wei and Yao, He and Zhang, Yongjun and Wang, Bingshu and Qin, Yongbin and Wu, Jia},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={27768--27777},
  year={2024}
}

Requirements

Python = 3.8

CUDA = 11.3

conda create -n mocha python=3.8
conda activate mocha
pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113

The following libraries are also required

tqdm
tensorboard
opt_einsum
einops
scipy
imageio
opencv-python-headless
scikit-image
timm == 0.6.5
six

Dataset

To evaluate/train RAFT-stereo, you will need to download the required datasets.

Sceneflow (Includes FlyingThings3D, Driving, Monkaa)
Middlebury
ETH3D
KITTI

By default stereo_datasets.py will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the datasets folder

├── datasets
    ├── FlyingThings3D
        ├── frames_cleanpass
        ├── frames_finalpass
        ├── disparity
    ├── Monkaa
        ├── frames_cleanpass
        ├── frames_finalpass
        ├── disparity
    ├── Driving
        ├── frames_cleanpass
        ├── frames_finalpass
        ├── disparity
    ├── KITTI
        ├── KITTI_2015
            ├── testing
            ├── training
        ├── KITTI_2012
            ├── testing
            ├── training
    ├── Middlebury
        ├── MiddEval3
    ├── ETH3D
        ├── two_view_training
        ├── two_view_training_gt
        ├── two_view_testing

Training

python train_stereo.py --batch_size 8 --mixed_precision

Evaluation

To evaluate a trained model on a validation set (e.g. Middlebury full resolution), run

python evaluate_stereo.py --restore_ckpt models/mocha-stereo.pth --dataset middlebury_F

Weight is available here.

FAQ

Q1. Weight for "tf_efficientnetv2_l"?

A1: Please refer to issue #6 "关于tf_efficientnetv2_l检查点的问题", #8 "预训练权重", and #9 "code error".

Todo List

[CVPR2024] V1 version
- [X] Paper
- [X] Code of MoCha-Stereo
V2 version
- [ ] Preprint manuscript
- [ ] Code of MoCha-V2

Acknowledgements

This project borrows the code from IGEV, RAFT-Stereo, GwcNet. We thank the original authors for their excellent works!
Grateful to Prof. Wenting Li, Prof. Huamin Qu, Dr. Junda Cheng, Mr./Mrs. "DLUTTengYH" and anonymous reviewers for their comments on "MoCha-Stereo: Motif Channel Attention Network for Stereo Matching" (V1 version of MoCha-Stereo).
This project is supported by Science and Technology Planning Project of Guizhou Province, Department of Science and Technology of Guizhou Province, China (Project No. [2023]159).
This project is supported by Natural Science Research Project of Guizhou Provincial Department of Education, China (QianJiaoJi[2022]029, QianJiaoHeKY[2021]022).

ZYangChen / MoCha-Stereo

readme