This is the official PyTorch implementation of our paper:
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
Feng Liang, Bichen Wu, Xiaoliang Dai, Kunpeng Li, Yinan Zhao, Hang Zhang, Peizhao Zhang, Peter Vajda, Diana Marculescu
Computer Vision and Pattern Recognition Conference (CVPR), 2023
[arXiv] [Project] [huggingface demo]
Please see installation guide.
Please see datasets preparation.
Please see getting started instruction.
Please see open clip training.
The majority of OVSeg is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
However portions of the project are under separate license terms: CLIP and ZSSEG are licensed under the MIT license; MaskFormer is licensed under the CC-BY-NC; openclip is licensed under the license at its repo.
If you use OVSeg in your research or wish to refer to the baseline results published in the paper, please use the following BibTeX entry.
@inproceedings{liang2023open,
title={Open-vocabulary semantic segmentation with mask-adapted clip},
author={Liang, Feng and Wu, Bichen and Dai, Xiaoliang and Li, Kunpeng and Zhao, Yinan and Zhang, Hang and Zhang, Peizhao and Vajda, Peter and Marculescu, Diana},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7061--7070},
year={2023}
}