Qinying-Liu / Awesome-Open-Vocabulary-Semantic-Segmentation

A curated publication list on open vocabulary semantic segmentation and related area (e.g. zero-shot semantic segmentation) resources..
436 stars 22 forks source link

Awesome-Open-Vocabulary-Semantic-Segmentation

If you find this project helpful, please consider giving it a star ⭐.

Contents

Open-Vocabulary Semantic Segmentation

Fully-Supervised Open-Vocabulary Semantic Segmentation

The model is trained on fully-supervised semantic segmentation datasets with pixel-level annotations (e.g., COCO Stuff dataset).

  1. [LSeg] | ICLR'22 | Language-driven Semantic Segmentation | [pdf] | [code]
  2. [OpenSeg] | ECCV'22 | Scaling Open-vocabulary Image Segmentation with Image-level Labels | [pdf] | [code]
  3. [Xu et al.] | ECCV'22 | A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model | [pdf] | [code]
  4. [SegCLIP] | ICML'23 | SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation | [pdf] | [code]
  5. [MaskCLIP] | ICML'23 | Open-Vocabulary Universal Image Segmentation with MaskCLIP | [pdf] | [code]
  6. [OVSeg] | CVPR'23 | Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP | [pdf] | [code]
  7. [X-Decoder] | CVPR'23 | Generalized Decoding for Pixel, Image, and Language | [pdf] | [code]
  8. [SAN] | CVPR'23(Highlight) | Side Adapter Network for Open-Vocabulary Semantic Segmentation | [pdf] | [code]
  9. [SAN] | TAPMI'23 | SAN: Side Adapter Network for Open-vocabulary Semantic Segmentation | [pdf] | [code]
  10. [ODISE] | CVPR'23 | Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models | [pdf] | [code]
  11. [FreeSeg] | CVPR'23 | FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation | [pdf] | [code]
  12. [OpenSeeD] | ICCV'23 | A Simple Framework for Open-Vocabulary Segmentation and Detection | [pdf] | [code]
  13. [GKC] | ICCV'23 | Global Knowledge Calibration for Fast Open-Vocabulary Segmentation | [pdf]
  14. [OPSNet] | ICCV'23 | Open-vocabulary Panoptic Segmentation with Embedding Modulation | [pdf] | [code]
  15. [MasQCLIP] | ICCV'23 | MasQCLIP for Open-Vocabulary Universal Image Segmentation | [pdf]
  16. [DeOP] | ICCV'23 | Open Vocabulary Semantic Segmentation with Decoupled One-Pass Network | [pdf] | [code]
  17. [Li et al.] | ICCV'23 | Open-vocabulary Object Segmentation with Diffusion Models | [pdf] | [code]
  18. [HIPIE] | NeurIPS'23 | Hierarchical Open-vocabulary Universal Image Segmentation | [pdf] | [code]
  19. [FC-CLIP] | NeurIPS'23 | Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP | [pdf] | [code]
  20. [MAFT] | NeurIPS'23 | Learning Mask-aware CLIP Representations for Zero-Shot Segmentation | [pdf] | [code]
  21. [ADA] | NeurIPS'23 | Open-Vocabulary Semantic Segmentation via Attribute Decomposition-Aggregation | [pdf]
  22. [Dao et al] | TMM | Class Enhancement Losses with Pseudo Labels for Open-Vocabulary Semantic Segmentation | [pdf]
  23. [SELF-SEG] | Arixv'23.12 | Self-Guided Open-Vocabulary Semantic Segmentation | [pdf]
  24. [OpenSD] | Arixv'23.12 | OpenSD: Unified Open-Vocabulary Segmentation and Detection | [pdf] | [code]
  25. [RENOVATE] | Arixv'24.03 | Renovating Names in Open-Vocabulary Segmentation Benchmarks | [pdf]
  26. [DreamCLIP] | ECCV'24 | DreamLIP: Language-Image Pre-training with Long Captions | [pdf] | [code]
  27. [CAT-Seg] | CVPR'24 | CAT-Seg : Cost Aggregation for Open-Vocabulary Semantic Segmentation | [pdf] | [code]
  28. [SED] | CVPR'24 | SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation | [pdf] | [code]
  29. [SCAN] | CVPR'24 | Open-Vocabulary Segmentation with Semantic-Assisted Calibration | [pdf] | [code]
  30. [OpenTrans] | CVPR'24 | Transferable and Principled Efficiency for Open-Vocabulary Segmentation | [pdf] | [code])
  31. [H-CLIP] | Arixv'24.05 | Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation | [pdf]
  32. [OpenDAS] | Arixv'24.05 | OpenDAS: Domain Adaptation for Open-Vocabulary Segmentation | [pdf]
  33. [USE] | CVPR'24 | USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation | [pdf]
  34. [EBSeg] | CVPR'24 | Open-Vocabulary Semantic Segmentation with Image Embedding Balancing | [pdf] | [code])
  35. [MAFT+] | ECCV'24 | Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation | [pdf] | [code])
  36. [R-Adapter] | ECCV'24 | Efficient and Versatile Robust Fine-Tuning of Zero-shot Models | [pdf] | [code])
  37. [MROVSeg] | Arixv'24.08 | MROVSeg: Breaking the Resolution Curse of Vision-Language Models in Open-Vocabulary Semantic Segmentation | [pdf]
  38. [FrozenSeg] | Arixv'24.09 | FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation | [pdf] | [code]

Weakly-Supervised Open-Vocabulary Semantic Segmentation

[text-supervised/language-supervised] The model is trained on weakly supervised datasets with only image-level annotations/captions (e.g., CC12M dataset).

  1. [GroupViT] | CVPR'22 | GroupViT: Semantic Segmentation Emerges from Text Supervision | [pdf] | [code]
  2. [ViL-Seg] | ECCV'22 | Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding | [pdf]
  3. [MaskCLIP+] | ECCV'22(Oral) | Extract Free Dense Labels from CLIP | [pdf] | [code]
  4. [ViewCo] | ICLR'23 | Viewco: Discovering Text-supervised Segmentation Masks via Multi-view Semantic Consistency | [pdf]
  5. [SegCLIP] | ICML'23 | SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation | [pdf] | [code]
  6. [CLIP-S4] | CVPR'23 | CLIP-S4: Language-Guided Self-Supervised Semantic Segmentation | [pdf]
  7. [PACL] | CVPR'23 | Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning | [pdf]
  8. [OVSegmentor] | CVPR'23 | Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision | [pdf] | [code]
  9. [SimSeg] | CVPR'23 | A Simple Framework for Text-Supervised Semantic Segmentation | [pdf] | [code]
  10. [TCL] | CVPR'23 | Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs | [pdf] | [code]
  11. [SimCon] | Arxiv'23.02 | SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation | [pdf]
  12. [Zhang et al.] | Arxiv'23.04 | Associating Spatially-Consistent Grouping with Text-supervised Semantic Segmentation | [pdf]
  13. [ZeroSeg] | ICCV'23 | Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only | [pdf]
  14. [CLIPpy] | ICCV'23 | Perceptual Grouping in Contrastive Vision-Language Models | [pdf]
  15. [MixReorg] | ICCV'23 | MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation | [pdf]
  16. [CoCu] | NeurIPS'23 | Bridging Semantic Gaps for Language-Supervised Semantic Segmentation | [pdf] | [code]
  17. [PGSeg] | NeurIPS'23 | Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation | [pdf] | [code]
  18. [SAM-CLIP] | Arixv'23.10 | SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding | [pdf]
  19. [CLIP-DINOiser] | Arixv'23.12 | CLIP-DINOiser: Teaching CLIP a few DINO tricks | [pdf] | [code]
  20. [TagAlign] | Arixv'23.12 | TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification | [pdf] | [code]
  21. [S-Seg] | Arixv'24.01 | Exploring Simple Open-Vocabulary Semantic Segmentation | [pdf] | [code]
  22. [CLIPSelf] | ICLR'24(Spotlight) | CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction | [pdf] | [code]
  23. [Uni-OVSeg] | Arixv'24.02 | Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision | [pdf] | [code]
  24. [MGCA] | Arixv'24.03 | Multi-Grained Cross-modal Alignment for Learning Open-vocabulary Semantic Segmentation from Text Supervision | [pdf]
  25. [TTD] | Arixv'24.04 | TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias | [pdf] | [code]
  26. [CoDe] | CVPR'24 | Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation | [pdf]
  27. [LLM-Supervision] | Arixv'24.03 | Training-Free Semantic Segmentation via LLM-Supervision | [pdf]
  28. [ProxyCLIP] | ECCV'24 | ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation | [pdf] | [code]

Training-Free Open-Vocabulary Semantic Segmentation

The model is modified from the off-the-shelf large models (e.g., CLIP, Diffusion models) without an additional training phase. Note that, the large models have already been trained with some datasets (e.g., image-caption datasets).

  1. [MaskCLIP] | ECCV'22(Oral) | Extract Free Dense Labels from CLIP | [pdf] | [code]
  2. [ReCo] | NeurIPS'22 | ReCo: Retrieve and Co-segment for Zero-shot Transfer | [pdf] | [code]
  3. [CLIP Surgery] | Arxiv'23.04 | CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks | [pdf] | [code]
  4. [OVDiff] | Arxiv'23.06 | Diffusion Models for Zero-Shot Open-Vocabulary Segmentation | [pdf]
  5. [DiffSegmenter] | Arxiv'23.09 | Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter | [pdf] | [code]
  6. [IPSeg] | IJCV'24 | Towards Training-free Open-world Segmentation via Image Prompting Foundation Models | [pdf]
  7. [SCLIP] | Arxiv'23.12 | SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference | [pdf]
  8. [GEM] | CVPR'24 | Grounding Everything: Emerging Localization Properties in Vision-Language Transformers | [pdf] | [code]
  9. [CLIP-DIY] | WACV'24 | CLIP-DIY: CLIP Dense Inference Yields Open-Vocabulary Semantic Segmentation For-Free | [pdf]
  10. [FOSSIL] | WACV'24 | FOSSIL: Free Open-Vocabulary Semantic Segmentation through Synthetic References Retrieval | [pdf]
  11. [TagCLIP] | AAAI'24 | TagCLIP: A Local-to-Global Framework to Enhance Open-VocabularyMulti-Label Classification of CLIP Without Training | [pdf] | [code]
  12. [EmerDiff] | ICLR'24 | EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models | [pdf] | [code]
  13. [FreeSeg-Diff] | Arxiv'24.03 | FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models | [pdf] | [code]
  14. [MaskDiffusion] | Arxiv'24.03 | MaskDiffusion: Exploiting Pre-trained Diffusion Models for Semantic Segmentation | [pdf] | [code]
  15. [TAG] | Arxiv'24.03 | TAG: Guidance-free Open-Vocabulary Semantic Segmentation | [pdf] | [code]
  16. [Sun et al.] | Arxiv'24.04 | Training-Free Semantic Segmentation via LLM-Supervision | [pdf]
  17. [NACLIP] | Arxiv'24.04 | Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation| [pdf] | [code]
  18. [PnP-OVSS] | CVPR'24 | Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models | [pdf] | [code]
  19. [CaR] | CVPR'24 | CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor | [pdf] | [code]
  20. [Wang et al.] | CVPR'24 | Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic Segmentation | [pdf] | [code]
  21. [FreeDA] | CVPR'24 | Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation| [pdf] | [code]
  22. [Yang et al.] | Arxiv'24.05 | Tuning-free Universally-Supervised Semantic Segmentation | [pdf]
  23. [CLIPTrase] | ECCV'24 | Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation | [pdf] | [code]
  24. [ClearCLIP] | ECCV'24 | ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference | [pdf] | [code]
  25. [ProxyCLIP] | ECCV'24 | ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation | [pdf] | [code]
  26. [LaVG] | ECCV'24 | In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation | [pdf] | [code]

Others

  1. [EntitySeg] | Arxiv'23.11 | Rethinking Evaluation Metrics of Open-Vocabulary Segmentation | [pdf] | [code]

Zero-Shot Semantic Segmentation

Different from open-vocabulary segmentation (cross-dataset), zero-shot methods split each dataset to seen classes and unseen classes.

  1. [ZegFormer] | CVPR'22 | ZegFormer: Decoupling Zero-Shot Semantic Segmentation | [pdf] | [code]
  2. [Xu et al.] | ECCV'22 | A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model | [pdf] | [code]
  3. [ZegCLIP] | CVPR'23 | ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation | [pdf] | [code]
  4. [PADing] | CVPR'23 | Primitive Generation and Semantic-related Alignment for Universal Zero-Shot Segmentation | [pdf] | [code]
  5. [DeOP] | ICCV'23 | Open Vocabulary Semantic Segmentation with Decoupled One-Pass Network | [pdf] | [code]
  6. [SPT] | AAAI'24 | Spectral Prompt Tuning: Unveiling Unseen Classes for Zero-Shot Semantic Segmentation | [pdf] | [code]
  7. [Chen et al.] | Arxiv'24.02 | Generalizable Semantic Vision Query Generation for Zero-shot Panoptic and Semantic Segmentation | [pdf]
  8. [LDVC] | Arxiv'24.03 | Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation | [pdf]
  9. [OTSeg] | Arxiv'24.03 | OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation | [pdf]
  10. [Cascade-CLIP] | ICML'24 | Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation | [pdf] | [code]
  11. [SimZSS] | Arxiv'24.07 | A Simple Framework for Open-Vocabulary Zero-Shot Segmentation | [pdf]
  12. [CaR] | CVPR'24 | CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor | [pdf] | [code]

Referring Image Segmentation

Fully-Supervised Referring Image Segmentation

  1. [CARIS] | ACM MM'23 | CARIS: Context-Aware Referring Image Segmentation | [pdf] | [code]
  2. [BKINet] | TMM'23 | Bilateral Knowledge Interaction Network for Referring Image Segmentation | [pdf] | [code]
  3. [Group-RES] | ICCV'23 | Advancing Referring Expression Segmentation Beyond Single Image | [pdf] | [code]
  4. [RIS-DMMI] | ICCV'23 | Beyond One-to-One: Rethinking the Referring Image Segmentation | [pdf] | [code]
  5. [ETRIS] | ICCV'23 | Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation | [pdf] | [code]
  6. [SEEM] | ArXiv'23.04 | Segment Everything Everywhere All at Once | [pdf] | [code]

    Weakly-Supervised Referring Image Segmentation

    1. [Strudel et al.] | ArXiv'22.05 | Weakly-supervised segmentation of referring expressions | [pdf]
    2. [Kim et al.] | ICCV'23 | Shatter and Gather: Learning Referring Image Segmentation with Text Supervision | [pdf] | [code]
    3. [TRIS] | ICCV'23 | Referring Image Segmentation Using Text Supervision | [pdf] | [code]
    4. [Jungbeom Lee et al.] | ICCV'23 | Weakly Supervised Referring Image Segmentation with Intra-Chunk and Inter-Chunk Consistency | [pdf]
    5. [PPT] | CVPR'24 | Curriculum Point Prompting for Weakly-Supervised Referring Segmentation | [pdf]

Open-Vocabulary Object Detection

  1. [RO-ViT] | CVPR'23(Highlight) | Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers | [pdf] | [code]
  2. [CAT] | CVPR'23 | CAT: LoCalization and IdentificAtion Cascade Detection Transformer for Open-World Object Detection | [pdf] | [code]
  3. [DetCLIPv2] | CVPR'23 | DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment | [pdf]
  4. [CondHead] | CVPR'23 | Learning to Detect and Segment for Open Vocabulary Object Detection | [pdf]
  5. [CORA] | CVPR'23 | CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching | [pdf] | [code]
  6. [ovdet] | CVPR'23 | Aligning Bag of Regions for Open-Vocabulary Object Detection | [pdf] | [code]
  7. [OADP] | CVPR'23 | Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection | [pdf] | [code]
  8. [F-VLM] | ICLR'23 | F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models | [pdf] | [code]
  9. [mm-ovod] | ICML 2023 | Multi-Modal Classifiers for Open-Vocabulary Object Detection | [pdf] | [code]
  10. [SGDN] | Arxiv'23.07 | Open-Vocabulary Object Detection via Scene Graph Discovery | [pdf]
  11. [MMC-Det] | Arxiv'23.08 | Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection | [pdf]
  12. [SAS-Det] | CVPR'24 | Taming Self-Training for Open-Vocabulary Object Detection | [pdf] | [code]
  13. [DITO] | Arxiv'23.09 | Detection-Oriented Image-Text Pretraining for Open-Vocabulary Detection | [pdf] | [code]
  14. [EdaDet] | ICCV'23 | EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment | [pdf] | [code]
  15. [LP-OVOD] | WACV'24 | LP-OVOD: Open-Vocabulary Object Detection by Linear Probing | [pdf] | [code]
  16. [DST-Det] | Arxiv'23.10 | DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection | [pdf] | [code]
  17. [CoDet] | NeurIPS'23 | CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection | [pdf] | [code]
  18. [PLAC] | Arxiv'23.12 | Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection | [pdf]
  19. [Sambor] | Arxiv'23.12 | Boosting Segment Anything Model Towards Open-Vocabulary Learning | [pdf] | [code]
  20. [DVDet] | ICLR'24 | LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors | [pdf]
  21. [DetCLIPv3] | CVPR'24 | DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection | [pdf]
  22. [AggDet] | Arxiv'24.04 | Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation | [pdf]
  23. [RALF] | CVPR'24 | Retrieval-Augmented Open-Vocabulary Object Detection | [pdf] | [code]
  24. [Chhipa et al.] | Arxiv'24.06 | Investigating Robustness of Open-Vocabulary Foundation Object Detectors under Distribution Shifts | [pdf]
  25. [SHiNe] | CVPR'24(Highlight) | SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection | [pdf] | [code]
  26. [RTGen] | Arxiv'24.06 | RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection | [pdf] | [code]
  27. [LBP] | CVPR'24 | Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection | [pdf]
  28. [YOLO-World] | CVPR'24 | Real-Time Open-Vocabulary Object Detection | [pdf] | [code]
  29. [OV-DINO] | Arxiv'24.07 | Unified Open-Vocabulary Detection with Language-Aware Selective Fusion | [pdf] | [code]
  30. [OVLW-DETR] | Arxiv'24.07 | OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer | [pdf] | [code]
  31. [LaMI-DETR] | ECCV'24 | LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction | [pdf] | [code]
  32. [MarvelOVD] | ECCV'24 | MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection | [pdf] | [code]

Universal Semantic Segmentation

  1. [Semantic-SAM] | ECCV'24 | Semantic-SAM: Segment and Recognize Anything at Any Granularity | [pdf] | [code]
  2. [Open-Vocabulary SAM] | ECCV'24 | Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively | [pdf] | [code]
  3. [OMG-Seg] | CVPR'24 | OMG-Seg: Is One Model Good Enough For All Segmentation? | [pdf] | [code]

Other Related Work

  1. [DENOISER] | Arxiv'24.04 | DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition | [pdf]
  2. [O2V-mapping] | Arxiv'24.04 | O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation | [pdf]
  3. [CMD-SE] | CVPR'24 | Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection | [pdf]
  4. [FG-CLIP] | CBMI'24 | Is CLIP the main roadblock for fine-grained open-world perception? | [pdf] | [code]
  5. [NegPrompt] | CVPR'24 | Learning Transferable Negative Prompts for Out-of-Distribution Detection | [pdf] | [code]
  6. [OVFoodSeg] | CVPR'24 | OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation | [pdf]
  7. [Fed-MP] | NAACL'24 | Open-Vocabulary Federated Learning with Multimodal Prototyping | [pdf]
  8. [PSALM] | ECCV'24 | PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model | [pdf] | [code]
  9. [OVAM] | Arxiv'24.03 | Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models | [pdf]
  10. [CLIP-VIS] | Arxiv'24.06 | CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation | [pdf]
  11. [RoboHop] | ICRA'24 | RoboHop: Segment-based Topological Map Representation for Open-World Visual Navigation | [pdf]
  12. [Rein] | CVPR'24 | Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation | [pdf] | [code]
  13. [OVMR] | CVPR'24 | OVMR: Open-Vocabulary Recognition with Multi-Modal References | [pdf] | [code]
  14. [PartCLIPSeg] | Arxiv'24.06 | Understanding Multi-Granularity for Open-Vocabulary Part Segmentation | [pdf] | [code]
  15. [GBC] | Arxiv'24.07 | Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions | [pdf]
  16. [TCC] | Arxiv'24.07 | A Study of Test-time Contrastive Concepts for Open-world, Open-vocabulary Semantic Segmentation | [pdf]
  17. [OPS] | ECCV'24 | Open Panoramic Segmentation | [pdf] | [code]
  18. [Yu et al.] | Arxiv'24.07 | PanopticRecon: Leverage Open-vocabulary Instance Segmentation for Zero-shot Panoptic Reconstruction | [pdf]
  19. [Oryon] | CVPR'24(Highlight) | Oryon: Open-Vocabulary Object 6D Pose Estimation | [pdf] | [code]
  20. [GLIS] | ECCV'24 | Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection | [pdf] | [code]
  21. [OVExp] | Arxiv'24.07 | OVExp: Open Vocabulary Exploration for Object-Oriented Navigation | [pdf] | [code]
  22. [OV-MLVC] | Arxiv'24.07 | Open Vocabulary Multi-Label Video Classification | [pdf]
  23. [DART] | Arxiv'24.07 | An automated end-to-end object detection pipeline with data Diversification, open-vocabulary bounding box Annotation, pseudo-label Review, and model Training | [pdf] | [code]
  24. [NOVIC] | Arxiv'24.07 | Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion | [pdf]
  25. [CerberusDet] | Arxiv'24.07 | CerberusDet: Unified Multi-Task Object Detection | [pdf]
  26. [GGSD] | Arxiv'24.07 | Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation | [pdf] | [code]
  27. [Diff2Scene] | ECCV'24 | Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models | [pdf]
  28. [SegPoint] | ECCV'24 | SegPoint: Segment Any Point Cloud via Large Language Model | [pdf] | [code]
  29. [LangOcc] | Arxiv'24.07 | LangOcc: Self-Supervised Open Vocabulary Occupancy Estimation via Volume Rendering | [pdf]
  30. [OVR] | Arxiv'24.07 | A Dataset for Open Vocabulary Temporal Repetition Counting in Videos | [pdf] | [code]
  31. [SAM-CP] | Arxiv'24.07 | SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation | [pdf] | [code]
  32. [OV-AVSS] | ACM MM'24(Oral) | Open-Vocabulary Audio-Visual Semantic Segmentation | [pdf] | [code]
  33. [Open3DRF] | Arxiv'24.08 | Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space | [pdf] | [code]
  34. [OVA-DETR] | Arxiv'24.08 | OVA-DETR: Open Vocabulary Aerial Object Detection Using Image-Text Alignment and Fusion | [pdf] | [code]
  35. [OVAL] | Arxiv'24.08 | Open-vocabulary Temporal Action Localization using VLMs | [pdf] | [code]
  36. [EMPOWER] | IROS'24 | EMPOWER: Embodied Multi-role Open-vocabulary Planning with Online Grounding and Execution | [pdf] | [code]

Related Survey

  1. Towards Open Vocabulary Learning: A Survey | [pdf]
  2. A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future | [pdf]

Feedback

If you have any suggestions or find missing papers, please don't hesitate to contact me via tbh3223@mail.ustc.edu.cn or lydyc@mail.ustc.edu.cn.