SHI-Labs / OneFormer

OneFormer: One Transformer to Rule Universal Image Segmentation, arxiv 2022 / CVPR 2023
https://praeclarumjj3.github.io/oneformer
MIT License
1.39k stars 129 forks source link
ade20k cityscapes coco image-segmentation instance-segmentation oneformer panoptic-segmentation semantic-segmentation transformer universal-segmentation

OneFormer: One Transformer to Rule Universal Image Segmentation

Framework: PyTorch Open In Colab HuggingFace space HuggingFace transformers YouTube License

PWC PWC PWC PWC PWC PWC PWC
PWC PWC PWC PWC PWC

Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi

Equal Contribution

[Project Page] [arXiv] [pdf] [BibTeX]

This repo contains the code for our paper OneFormer: One Transformer to Rule Universal Image Segmentation.

Features

OneFormer

Contents

  1. News
  2. Installation Instructions
  3. Dataset Preparation
  4. Execution Instructions
  5. Results
  6. Citation

News

Installation Instructions

Dataset Preparation

Execution Instructions

Training

Evaluation

Demo

Results

Results

ADE20K

Method Backbone Crop Size PQ AP mIoU
(s.s)
mIoU
(ms+flip)
#params config Checkpoint
OneFormer Swin-L 640×640 49.8 35.9 57.0 57.7 219M config model
OneFormer Swin-L 896×896 51.1 37.6 57.4 58.3 219M config model
OneFormer Swin-L 1280×1280 51.4 37.8 57.0 57.7 219M config model
OneFormer ConvNeXt-L 640×640 50.0 36.2 56.6 57.4 220M config model
OneFormer DiNAT-L 640×640 50.5 36.0 58.3 58.4 223M config model
OneFormer DiNAT-L 896×896 51.2 36.8 58.1 58.6 223M config model
OneFormer DiNAT-L 1280×1280 51.5 37.1 58.3 58.7 223M config model
OneFormer (COCO-Pretrained) DiNAT-L 1280×1280 53.4 40.2 58.4 58.8 223M config model | pretrained
OneFormer ConvNeXt-XL 640×640 50.1 36.3 57.4 58.8 372M config model

Cityscapes

Method Backbone PQ AP mIoU
(s.s)
mIoU
(ms+flip)
#params config Checkpoint
OneFormer Swin-L 67.2 45.6 83.0 84.4 219M config model
OneFormer ConvNeXt-L 68.5 46.5 83.0 84.0 220M config model
OneFormer (Mapillary Vistas-Pretrained) ConvNeXt-L 70.1 48.7 84.6 85.2 220M config model | pretrained
OneFormer DiNAT-L 67.6 45.6 83.1 84.0 223M config model
OneFormer ConvNeXt-XL 68.4 46.7 83.6 84.6 372M config model
OneFormer (Mapillary Vistas-Pretrained) ConvNeXt-XL 69.7 48.9 84.5 85.8 372M config model | pretrained

COCO

Method Backbone PQ PQTh PQSt AP mIoU #params config Checkpoint
OneFormer Swin-L 57.9 64.4 48.0 49.0 67.4 219M config model
OneFormer DiNAT-L 58.0 64.3 48.4 49.2 68.1 223M config model

Mapillary Vistas

Method Backbone PQ mIoU
(s.s)
mIoU
(ms+flip)
#params config Checkpoint
OneFormer Swin-L 46.7 62.9 64.1 219M config model
OneFormer ConvNeXt-L 47.9 63.2 63.8 220M config model
OneFormer DiNAT-L 47.8 64.0 64.9 223M config model

Citation

If you found OneFormer useful in your research, please consider starring ⭐ us on GitHub and citing 📚 us in your research!

@inproceedings{jain2023oneformer,
      title={{OneFormer: One Transformer to Rule Universal Image Segmentation}},
      author={Jitesh Jain and Jiachen Li and MangTik Chiu and Ali Hassani and Nikita Orlov and Humphrey Shi},
      journal={CVPR}, 
      year={2023}
    }

Acknowledgement

We thank the authors of Mask2Former, GroupViT, and Neighborhood Attention Transformer for releasing their helpful codebases.