JeffWang987 / MVSTER

[ECCV 2022] MVSTER: Epipolar Transformer for Efficient Multi-View Stereo
MIT License
183 stars 13 forks source link
depth-estimation eccv2022 multi-view-stereo transformer

MVSTER

MVSTER: Epipolar Transformer for Efficient Multi-View Stereo, ECCV 2022. arXiv

This repository contains the official implementation of the paper: "MVSTER: Epipolar Transformer for Efficient Multi-View Stereo".

Introduction

MVSTER is a learning-based MVS method which achieves competitive reconstruction performance with significantly higher efficiency. MVSTER leverages the proposed epipolar Transformer to learn both 2D semantics and 3D spatial associations efficiently. Specifically, the epipolar Transformer utilizes a detachable monocular depth estimator to enhance 2D semantics and uses cross-attention to construct data-dependent 3D associations along epipolar line. Additionally, MVSTER is built in a cascade structure, where entropy-regularized optimal transport is leveraged to propagate finer depth estimations in each stage.

Installation

MVSTER is tested on:

Training

├── Cameras    
├── Depths
├── Depths_raw   
├── Rectified
├── Rectified_raw (Optional)                                      

In scripts/train_dtu.sh, set DTU_TRAINING as $DTU_TRAINING

Train MVSTER (Multi-GPU training):

Testing

Metric

Results on DTU (single RTX 3090)

Acc. Comp. Overall. Inf. Time
MVSTER (mid size) 0.350 0.276 0.313 0.09s
MVSTER (raw size) 0.340 0.266 0.303 0.17s

Point cloud results on DTU, Tanks and Temples, ETH3D

If you find this project useful for your research, please cite:

@misc{wang2022mvster,
      title={MVSTER: Epipolar Transformer for Efficient Multi-View Stereo}, 
      author={Xiaofeng Wang, Zheng Zhu, Fangbo Qin, Yun Ye, Guan Huang, Xu Chi, Yijia He and Xingang Wang},
      journal={arXiv preprint arXiv:2204.07346},
      year={2022}
}

Acknowledgements

Our work is partially baed on these opening source work: MVSNet, MVSNet-pytorch, cascade-stereo, PatchmatchNet.

We appreciate their contributions to the MVS community.