Fourier7754 / AsymFormer

AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic Segmentation
Apache License 2.0
31 stars 2 forks source link

AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic Segmentation (CVPR 2024 - USM Workshop ) [Paper] [Pre-trained Model] [TensorRT Model]

PWC PWC PWC Apache 2.0 License

This repository contains the official implementation of AsymFormer, a novel network for real-time RGB-D semantic segmentation.

Results

AsymFormer achieves competitive results on the following datasets:

Notably, it also provides impressive inference speeds:

Installation

To run this project, we suggest using Ubuntu 20.04, PyTorch 2.0.1, and CUDA version higher than 12.0.

Other necessary package for running the evaluation and TensorRT FP16 quantization inference:

pip install timm
pip install scikit-image
pip install opencv-python-headless==4.5.5.64
pip install thop
pip install onnx
pip install onnxruntime
pip install tensorrt==8.6.0
pip install pycuda

Data Preparation

We used the same data source as the ACNet. The processed NYUv2 data (.npy) can be downloaded by Google Drive.

We find the former NYUv2 data has some mistakes. So we re-generate training data from original NYUv2 matlab .mat file: Google Drive.

SUNRGBD Dataset: Google Drive

Train

To train the AsymFormer on NYUv2 dataset, you need to download the processed png format dataset Google Drive. and unzip the file to current folder. After that, the folder should be like:

├── data
│   ├── images
│   ├── depths
│   ├── labels
│   ├── train.txt
│   └── test.txt
├── utils
│   ├── __init__.py
│   └── utils.py
├── src
│   └── model files
├── NYUv2_dataloader.py
├── train.py
└── eval.py

Then run the train.py script.

python train.py

Note: The training process with batch size 8 requires 19GB GPU VRAM. We will release mixed-precision training script soon wihch will require about 12GB of VRAM. However, the mixed-precision training will only work on Linux platform.

Eval

Run the eval.py script to evaluate AsymFormer on NYUv2 Dataset.

python eval.py

If you wish to run evaluation in multi-scale inference strategy, run the MS5_eval.py script:

python MS5_eval.py

Model Exporting and Quantization

Currently, we have provided ONNX model and TensorRT FP16 model for evaluation and inference.

FP16 Inference (RTX3090 Platform)

The TensorRT inference notebook can be found in Folder. You can test AsymFormer on your local environment by:

Optimize the AsymFormer for your own platform

You can generate your own TensorRT engine from the ONNX model. We provide the original ONNX model and corresponding notebook to help you genrate the TensorRT model

Training

The The souce code of AsymFormer will be released soon.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Acknowledgements

If you find this repository useful in your research, please consider citing:

@misc{du2023asymformer,
      title={AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic Segmentation}, 
      author={Siqi Du and Weixi Wang and Renzhong Guo and Shengjun Tang},
      year={2023},
      eprint={2309.14065},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Contact

For any inquiries, please contact siqi.du1014@outlook.com. Home page of the author: Siqi.DU's ResearchGate