FangjinhuaWang / PatchmatchNet

Official code of PatchmatchNet (CVPR 2021 Oral)
MIT License
501 stars 70 forks source link
3d-reconstruction deep-learning multi-view-stereo

PatchmatchNet (CVPR2021 Oral)

official source code of paper 'PatchmatchNet: Learned Multi-View Patchmatch Stereo'

Updates

Introduction

PatchmatchNet is a novel cascade formulation of learning-based Patchmatch which aims at decreasing memory consumption and computation time for high-resolution multi-view stereo. If you find this project useful for your research, please cite:

@misc{wang2020patchmatchnet,
      title={PatchmatchNet: Learned Multi-View Patchmatch Stereo}, 
      author={Fangjinhua Wang and Silvano Galliani and Christoph Vogel and Pablo Speciale and Marc Pollefeys},
      journal={CVPR},
      year={2021}
}

Installation

Requirements

pip install -r requirements.txt

Reproducing Results

Camera file cam.txt stores the camera parameters, which includes extrinsic, intrinsic, minimum depth and maximum depth:

extrinsic
E00 E01 E02 E03
E10 E11 E12 E13
E20 E21 E22 E23
E30 E31 E32 E33

intrinsic
K00 K01 K02
K10 K11 K12
K20 K21 K22

DEPTH_MIN DEPTH_MAX 

pair.txt stores the view selection result. For each reference image, N (10 or more) best source views are stored in the file:

TOTAL_IMAGE_NUM
IMAGE_ID0                       # index of reference image 0 
10 ID0 SCORE0 ID1 SCORE1 ...    # 10 best source images for reference image 0 
IMAGE_ID1                       # index of reference image 1
10 ID0 SCORE0 ID1 SCORE1 ...    # 10 best source images for reference image 1 
...

In evaluations/dtu/BaseEvalMain_web.m, set dataPath as path to SampleSet/MVS Data/, plyPath as directory that stores the reconstructed point clouds and resultsPath as directory to store the evaluation results. Then run evaluations/dtu/BaseEvalMain_web.m in matlab.

The results look like:

Acc. (mm) Comp. (mm) Overall (mm)
0.427 0.277 0.352

Evaluation on Custom Dataset

Training

Download pre-processed DTU's training set. The dataset is already organized as follows:

root_directory
├── Cameras_1
│    ├── train
│    │    ├── 00000000_cam.txt
│    │    ├── 00000000_cam.txt
│    │    └── ...
│    └── pair.txt
├── Depths_raw
│    ├── scan1
│    │    ├── depth_map_0000.pfm
│    │    ├── depth_visual_0000.png
│    │    ├── depth_map_0001.pfm
│    │    ├── depth_visual_0001.png
│    │    └── ...
│    ├── scan2
│    └── ...
└── Rectified
     ├── scan1_train
     │    ├── rect_001_0_r5000.png
     │    ├── rect_001_1_r5000.png
     │    ├── ...
     │    ├── rect_001_6_r5000.png
     │    ├── rect_002_0_r5000.png
     │    ├── rect_002_1_r5000.png
     │    ├── ...
     │    ├── rect_002_6_r5000.png
     │    └── ...
     ├── scan2_train
     └── ...

To use this dataset directly look into the Legacy Training section below. For the current version of training the dataset needs to be converted to a format compatible with MVSDataset in ./datasets/mvs.py using the script convert_dtu_dataset.py as follows:

python convert_dtu_dataset.py --input_folder <original_dataset> --output_folder <converted_dataset> --scan_list ./lists/dtu/all.txt

The converted dataset will now be in a format similar to the evaluation datasets:

root_directory
├── scan1 (scene_name1)
├── scan2 (scene_name2) 
│     ├── cams (camera parameters)
│     │   ├── 00000000_cam.txt   
│     │   ├── 00000001_cam.txt   
│     │   └── ...                
│     ├── depth_gt (ground truth depth maps)
│     │   ├── 00000000.pfm   
│     │   ├── 00000001.pfm   
│     │   └── ...                
│     ├── images (images at 7 light indexes) 
│     │   ├── 0 (light index 0)
│     │   │   ├── 00000000.jpg       
│     │   │   ├── 00000001.jpg
│     │   │   └── ...
│     │   ├── 1 (light index 1)
│     │   └── ...                
│     ├── masks (depth map masks) 
│     │   ├── 00000000.png       
│     │   ├── 00000001.png       
│     │   └── ...                
│     └── pair.txt
└── ...

Legacy Training

To train directly on the original DTU dataset the legacy training script train_dtu.py (using the legacy MVSDataset from datasets/dtu_yao.py) needs to be called from the train.sh script.

Note:

--patchmatch_iteration represents the number of iterations of Patchmatch on multi-stages (e.g., the default number 1,2,2 means 1 iteration on stage 1, 2 iterations on stage 2 and 2 iterations on stage 3). --propagate_neighbors represents the number of neighbors for adaptive propagation (e.g., the default number 0,8,16 means no propagation for Patchmatch on stage 1, using 8 neighbors for propagation on stage 2 and using 16 neighbors for propagation on stage 3). As explained in our paper, we do not include adaptive propagation for the last iteration of Patchmatch on stage 1 due to the requirement of photometric consistency filtering. So in our default case (also for our pretrained model), we set the number of propagation neighbors on stage 1 as 0 since the number of iteration on stage 1 is 1. If you want to train the model with more iterations on stage 1, change the corresponding number in --propagate_neighbors to include adaptive propagation for Patchmatch expect for the last iteration.

Acknowledgements

This project is done in collaboration with "Microsoft Mixed Reality & AI Zurich Lab".

Thanks to Yao Yao for open-sourcing his excellent work MVSNet. Thanks to Xiaoyang Guo for open-sourcing his PyTorch implementation of MVSNet MVSNet-pytorch.