MendelXu / SAN

Open-vocabulary Semantic Segmentation
https://mendelxu.github.io/SAN/
MIT License
295 stars 27 forks source link
cvpr2023 open-vocabulary-semantic-segmentation prompt-tuning

[CVPR2023-Highlight] Side Adapter Network for Open-Vocabulary Semantic Segmentation

[PAMI] SAN: Side Adapter Network for Open-Vocabulary Semantic Segmentation

PWC PWC PWC PWC PWC

This is the official implementation of our conference paper : "Side Adapter Network for Open-Vocabulary Semantic Segmentation" (main branch) and journal paper: "SAN: Side Adapter Network for Open-Vocabulary Semantic Segmentation " (video branch).

Introduction

This paper presents a new framework for open-vocabulary semantic segmentation with the pre-trained vision-language model, named Side Adapter Network (SAN). Our approach models the semantic segmentation task as a region recognition problem. A side network is attached to a frozen CLIP model with two branches: one for predicting mask proposals, and the other for predicting attention bias which is applied in the CLIP model to recognize the class of masks. This decoupled design has the benefit CLIP in recognizing the class of mask proposals. Since the attached side network can reuse CLIP features, it can be very light. In addition, the entire network can be trained end-to-end, allowing the side network to be adapted to the frozen CLIP model, which makes the predicted mask proposals CLIP-aware. Our approach is fast, accurate, and only adds a few additional trainable parameters. We evaluate our approach on multiple semantic segmentation benchmarks. Our method significantly outperforms other counterparts, with up to 18 times fewer trainable parameters and 19 times faster inference speed.

Tab of Content

Demo

Installation

  1. Clone the repository
    git clone https://github.com/MendelXu/SAN.git
  2. Navigate to the project directory
    cd SAN
  3. Install the dependencies
    bash install.sh

    Hint: You can run the job in the docker instead of installing dependencies locally. Run with pre-built docker:

    docker run -it --gpus all --shm-size 8G mendelxu/pytorch:d2_nvcr_2008 /bin/bash

    or build your docker with provided dockerfile docker/Dcokerfile.

Data Preparation

See SimSeg for reference. The data should be organized like:

datasets/
    coco/
        ...
        train2017/
        val2017/
        stuffthingmaps_detectron2/
    VOC2012/
        ...
        images_detectron2/
        annotations_detectron2/
    pcontext/
        ...
        val/
    pcontext_full/
        ...
        val/
    ADEChallengeData2016/
        ...
        images/
        annotations_detectron2/
    ADE20K_2021_17_01/
        ...
        images/
        annotations_detectron2/        

Hint In the code, those datasets are registered with their related dataset names. The relationship is:

coco_2017_*_stuff_sem_seg : COCO Stuff-171
voc_sem_seg_*: Pascal VOC-20
pcontext_sem_seg_*: Pascal Context-59
ade20k_sem_seg_*: ADE-150
pcontext_full_sem_seg_*: Pascal Context-459
ade20k_full_sem_seg_*: ADE-847

Usage

FAQ

If you found it is too late to get a response from the author on the github, please e-mail me directly at [shea.mendel] [AT] [gmail.com].

License

Distributed under the MIT License. See LICENSE for more information.

Cite

If you find it helpful, you can cite our paper in your work.

@proceedings{xu2023side,
  title={Side Adapter Network for Open-Vocabulary Semantic Segmentation},
  author={Mengde Xu, Zheng Zhang, Fangyun Wei, Han Hu, Xiang Bai},
  journal={CVPR},
  year={2023}
}