This repository is the implementation of IMFNet: Interpretable Multimodal Fusion for Point Cloud Registration.
The existing state-of-the-art point descriptors relies on structure information only, which omit the texture information. However, texture information is crucial for our humans to distinguish a scene part. Moreover, the current learning-based point descriptors are all black boxes which are unclear how the original points contribute to the fnal descriptors. In this paper, we propose a new multimodal fusion method to generate a point cloud registration descriptors by considering both structure and texture information. Specifcally, a novel attention-fusion module is designed to extract the weighted texture information for the descriptors extraction. In addition, we propose an interpretable module to explain our neural network by visually showing the original points in contributing to the fnal descriptors. We use the descriptors’ channel value as the loss to backpropagate to the target layer and consider the gradient as the signifcance of this point to the fnal descriptors. This paper moves one step further to explainable deep learning in the registration task. Comprehensive experiments on 3DMatch, 3DLoMatch and KITTI demonstrate that the multimodal fusion descriptors achieves state-of-the-art accuracy and improve the descriptors’ distinctiveness. We also demonstrate that our interpretable module in explaining the registration descriptors extraction.
FMR Table | RR Table |
---|---|
Feature-match recall and Rigistration Recall in log scale on the 3DMatch benchmark.
The network architecture of the proposed IMFNet. The input is a point cloud and an image, and the output is a point descriptors. Inside the attention-fusion module, W is the weight matrix, FI is the point texture feature. Then, the fusion feature (Ffe) of point structure feature (Fpe) and point texture feature (FI) as an input to the decoder module to get the output descriptors. Final, the descriptors are interpreted by DAM.
The Overall Framework |
---|
Please refer to our paper for more details.
Our DAM can visiualize the points contribution distribution of descriptor extraction.
IMFNet | FCGF |
---|---|
Regarding the 3DMatch and 3DLoMatch, the images are selected for each point cloud based on their covered content to construct a dataset of paired images and point clouds named 3DImageMatch. Our experiments are conducted on this dataset. The dataset construction and training details are attached in the supplement material. Download the 3DImageMatch/Kitti . The code is p2gl.
Please concat the files
# 3DImageMatch
cat x00 x01 ... x17 > 3DImageMatch.zip
# Kitti
cat Kitti01 ... Kitti10 > Kitti.zip
Then, unzip the zip files.
Train the 3DMatch
python train.py train_3DMatch.py
Train the Kitti
python train.py train_Kitti.py
For benchmarking the trained weights, download the pretrain file here . We also provide key points (5000) and some other results, here
Evaluating the 3DMatch or 3DLoMatch
# Generating Descriptors
python generate_desc.py --source <Testing Set Path> --target <Output Path> --model <CheckPoint Path>
# Evaluating 3DMatch
python evaluation_3dmatch.py --pcloud_root <Testing Set Path> --out_root <Output Path> --desc_types ['IMFNet'] --desc_roots ['<Descriptors Path>'] --benchmarks "3DMatch"
# Evaluating 3DLoMatch
python evaluation_3dmatch.py --pcloud_root <Testing Set Path> --out_root <Output Path> --desc_types ['IMFNet'] --desc_roots ['<Descriptors Path>'] --benchmarks "3DLoMatch"
Evaluating the Kitti
# Evaluating Kitti
python evaluation_kitti.py --save_dir <Output Path> --kitti_root <Testing Set Path>
Visualization the target descriptor
python dam.py --target <target point index>
Please cite the following papers if you use our code:
@article{huang2021imfnet,
title={IMFNet: Interpretable Multimodal Fusion for Point Cloud Registration},
author={Xiaoshui Huang, Wentao Qu, Yifan Zuo, Yuming Fang, Xiaowei Zhao},
journal={IEEE Robotics and Automation Letters},
year={2022}
}