CSIPlab / MMSFormer

We propose a novel fusion strategy that can effectively fuse information from different modality combinations. We also propose a new model named Multi-Modal Segmentation TransFormer (MMSFormer) that incorporates the proposed fusion strategy to perform multimodal material and semantic segmentation tasks.
https://csiplab.github.io/MMSFormer/
Apache License 2.0
9 stars 4 forks source link
material-segmentation multimodal-segmentation segmentation semantic-segmentation
## MMSFormer: Multimodal Transformer for Material and Semantic Segmentation
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multimodal-transformer-for-material/semantic-segmentation-on-mcubes)](https://paperswithcode.com/sota/semantic-segmentation-on-mcubes?p=multimodal-transformer-for-material) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multimodal-transformer-for-material/semantic-segmentation-on-fmb-dataset)](https://paperswithcode.com/sota/semantic-segmentation-on-fmb-dataset?p=multimodal-transformer-for-material) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multimodal-transformer-for-material/thermal-image-segmentation-on-pst900)](https://paperswithcode.com/sota/thermal-image-segmentation-on-pst900?p=multimodal-transformer-for-material)

Introduction

Leveraging information across diverse modalities is known to enhance performance on multimodal segmentation tasks. However, effectively fusing information from different modalities remains challenging due to the unique characteristics of each modality. In this paper, we propose a novel fusion strategy that can effectively fuse information from different modality combinations. We also propose a new model named Multi-Modal Segmentation TransFormer (MMSFormer) that incorporates the proposed fusion strategy to perform multimodal material and semantic segmentation tasks. MMSFormer outperforms current state-of-the-art models on three different datasets. As we begin with only one input modality, performance improves progressively as additional modalities are incorporated, showcasing the effectiveness of the fusion block in combining useful information from diverse input modalities. Ablation studies show that different modules in the fusion block are crucial for overall model performance. Furthermore, our ablation studies also highlight the capacity of different input modalities to improve performance in the identification of different types of materials.

For more details, please check our arXiv paper.

Updates

MMSFormer model

![MMSFormer](figs/MMSFormer-V2.png) **Figure:** Overall architecture of MMSFormer model and proposed fusion block.

Environment

First, create and activate the environment using the following commands:

conda env create -f environment.yaml
conda activate mmsformer

Data preparation

Download the dataset:

Then, put the dataset under data directory as follows:

data/
├── MCubeS
│   ├── polL_color
│   ├── polL_aolp_sin
│   ├── polL_aolp_cos
│   ├── polL_dolp
│   ├── NIR_warped
│   ├── NIR_warped_mask
│   ├── GT
│   ├── SSGT4MS
│   ├── list_folder
│   └── SS
├── FMB
│   ├── test
│   │   ├── color
│   │   ├── Infrared
│   │   ├── Label
│   │   └── Visible
│   ├── train
│   │   ├── color
│   │   ├── Infrared
│   │   ├── Label
│   │   └── Visible
├── PST
│   ├── test
│   │   ├── rgb
│   │   ├── thermal
│   │   └── labels
│   ├── train
│   │   ├── rgb
│   │   ├── thermal
│   │   └── labels

Model Zoo

MCubeS

Model-Modal mIoU weight
MCubeS-RGB 50.44 GoogleDrive
MCubeS-RGB-A 51.30 GoogleDrive
MCubeS-RGB-A-D 52.03 GoogleDrive
MCubeS-RGB-A-D-N 53.11 GoogleDrive

FMB

Model-Modal mIoU weight
FMB-RGB 57.17 GoogleDrive
FMB-RGB-Infrared 61.68 GoogleDrive

PST900

Model-Modal mIoU weight
PST-RGB-T 87.45 GoogleDrive

Training

Before training, please download pre-trained SegFormer, and put it in the correct directory following this structure:

checkpoints/pretrained/segformer
├── mit_b0.pth
├── mit_b1.pth
├── mit_b2.pth
├── mit_b3.pth
└── mit_b4.pth

To train MMSFormer model, please update the appropriate configuration file in configs/ with appropriate paths and hyper-parameters. Then run as follows:

cd path/to/MMSFormer
conda activate mmsformer

python -m tools.train_mm --cfg configs/mcubes_rgbadn.yaml

python -m tools.train_mm --cfg configs/fmb_rgbt.yaml

python -m tools.train_mm --cfg configs/pst_rgbt.yaml

Evaluation

To evaluate MMSFormer models, please download respective model weights (GoogleDrive) and save them under any folder you like.

Then, update the EVAL section of the appropriate configuration file in configs/ and run:

cd path/to/MMSFormer
conda activate mmsformer

python -m tools.val_mm --cfg configs/mcubes_rgbadn.yaml

python -m tools.val_mm --cfg configs/fmb_rgbt.yaml

python -m tools.val_mm --cfg configs/pst_rgbt.yaml

License

This repository is under the Apache-2.0 license. For commercial use, please contact with the authors.

Citations

If you use MMSFormer model, please cite the following work:

Acknowledgements

Our codebase is based on the following Github repositories. Thanks to the following public repositories:

Note: This is a research level repository and might contain issues/bugs. Please contact the authors for any query.