Meng Wei
Xiaoyu Yue
Wenwei Zhang
Xihui Liu
Shu Kong
Jiangmiao Pang*
Shanghai AI Laboratory The University of Hong Kong The University of Sydney University of Macau Texas A&M University
OV-PARTS is a benchmark for Open-Vocabulary Part Segmentation by using the capabilities of large-scale Vision-Language Models (VLMs).
Benchmark Datasets: Two refined versions of two publicly available datasets:
Benchmark Tasks: Three specific tasks which provides insights into the analogical reasoning
, open granularity
and few-shot adapting
abilities of models.
Benchmark Baselines: Baselines based on existing two-stage and one-stage object-level open vocabulary segmentation methods, including ZSseg, CLIPSeg, CATSeg.
We organize the Open Vocabulary Part Segmentation (OV-PARTS) Challenge in the Visual Perception via Learning in an Open World (VPLOW) Workshop. Please check our website!
Clone this repository
git clone https://github.com/OpenRobotLab/OV_PARTS.git
cd OV_PARTS
Create a conda environment with Python3.8+
and install python requirements
conda create -n ovparts python=3.8
conda activate ovparts
pip install -r requirements.txt
After downloading the two benchmark datasets, please extract the files by running the following command and place the extracted folder under the "Datasets" directory.
tar -xzf PascalPart116.tar.gz
tar -xzf ADE20KPart234.tar.gz
The Datasets folder should follow this structure:
Datasets/
├─Pascal-Part-116/
│ ├─train_16shot.json
│ ├─images/
│ │ ├─train/
│ │ └─val/
│ ├─annotations_detectron2_obj/
│ │ ├─train/
│ │ └─val/
│ └─annotations_detectron2_part/
│ ├─train/
│ └─val/
└─ADE20K-Part-234/
├─images/
│ ├─training/
│ ├─validation/
├─train_16shot.json
├─ade20k_instance_train.json
├─ade20k_instance_val.json
└─annotations_detectron2_part/
├─training/
└─validation/
Create {train/val}_{obj/part}_label_count.json
files for Pascal-Part-116.
python baselines/data/datasets/mask_cls_collect.py Datasets/Pascal-Part-116/annotations_detectron2_{obj/part}/{train/val} Datasets/Pascal-Part-116/annotations_detectron2_part/{train/val}_{obj/part}_label_count.json
Training the two-stage baseline ZSseg+
.
Please first download the clip model fintuned with CPTCoOp.
Then run the training command:
python train_net.py --num-gpus 8 --config-file configs/${SETTING}/zsseg+_R50_coop_${DATASET}.yaml
Training the one-stage baselines CLIPSeg
and CATSeg
.
Please first download the pre-trained object models of CLIPSeg and CATSeg and place them under the "pretrain_weights" directory.
Models | Pre-trained checkpoint |
---|---|
CLIPSeg | download |
CATSeg | download |
Then run the training command:
# For CATseg.
python train_net.py --num-gpus 8 --config-file configs/${SETTING}/catseg_${DATASET}.yaml
# For CLIPseg.
python train_net.py --num-gpus 8 --config-file configs/${SETTING}/clipseg_${DATASET}.yaml
We provide the trained weights for the three baseline models reported in the paper.
Models | Setting | Pascal-Part-116 checkpoint | ADE20K-Part-234 checkpoint |
---|---|---|---|
ZSSeg+ | Zero-shot | download | download |
CLIPSeg | Zero-shot | download | download |
CatSet | Zero-shot | download | download |
CLIPSeg | Few-shot | download | download |
CLIPSeg | cross-dataset | - | download |
To evaluate the trained models, add --eval-only
to the training command.
For example:
python train_net.py --num-gpus 8 --config-file configs/${SETTING}/catseg_${DATASET}.yaml --eval-only MODEL.WEIGHTS ${WEIGHT_PATH}
Zero-shot performance of the two-stage and one-stage baselines on Pascal-Part-116
Model | Backbone | Finetuning | Oracle-Obj | Pred-Obj | ||||
---|---|---|---|---|---|---|---|---|
Seen | Unseen | Harmonic | Seen | Unseen | Harmonic | |||
Fully-Supervised | ||||||||
MaskFormer | ResNet-50 | - | 55.28 | 52.14 | - | 53.07 | 47.82 | - |
Two-Stage Baselines | ||||||||
ZSseg | ResNet-50 | - | 49.35 | 12.57 | 20.04 | 40.80 | 12.07 | 18.63 |
ZSseg+ | ResNet-50 | CPTCoOp | 55.33 | 19.17 | 28.48 | 54.23 | 17.10 | 26.00 |
ZSseg+ | ResNet-50 | CPTCoCoOp | 54.43 | 19.04 | 28.21 | 53.31 | 16.08 | 24.71 |
ZSseg+ | ResNet-101c | CPTCoOp | 57.88 | 21.93 | 31.81 | 56.87 | 20.29 | 29.91 |
One-Stage Baselines | ||||||||
CATSeg | ResNet-101 &ViT-B/16 |
- | 14.89 | 10.29 | 12.17 | 13.65 | 7.73 | 9.87 |
CATSeg | ResNet-101 &ViT-B/16 |
B+D | 43.97 | 26.11 | 32.76 | 41.65 | 26.08 | 32.07 |
CLIPSeg | ViT-B/16 | - | 22.33 | 19.73 | 20.95 | 14.32 | 10.52 | 12.13 |
CLIPSeg | ViT-B/16 | VA+L+F+D | 48.68 | 27.37 | 35.04 | 44.57 | 27.79 | 34.24 |
Zero-shot performance of the two-stage and one-stage baselines on ADE20K-Part-234
Model | Backbone | Finetuning | Oracle-Obj | Pred-Obj | ||||
---|---|---|---|---|---|---|---|---|
Seen | Unseen | Harmonic | Seen | Unseen | Harmonic | |||
Fully-Supervised | ||||||||
MaskFormer | ResNet-50 | - | 46.25 | 47.86 | - | 35.52 | 16.56 | - |
Two-Stage Baselines | ||||||||
ZSseg+ | ResNet-50 | CPTCoOp | 43.19 | 27.84 | 33.85 | 21.30 | 5.60 | 8.87 |
ZSseg+ | ResNet-50 | CPTCoCoOp | 39.67 | 25.15 | 30.78 | 19.52 | 2.98 | 5.17 |
ZSseg+ | ResNet-101c | CPTCoOp | 43.41 | 25.70 | 32.28 | 21.42 | 3.33 | 5.76 |
One-Stage Baselines | ||||||||
CATSeg | ResNet-101 &ViT-B/16 |
- | 11.49 | 8.56 | 9.81 | 6.30 | 3.79 | 4.73 |
CATSeg | ResNet-101 &ViT-B/16 |
B+D | 31.40 | 25.77 | 28.31 | 20.23 | 8.27 | 11.74 |
CLIPSeg | ViT-B/16 | - | 15.27 | 18.01 | 16.53 | 5.00 | 3.36 | 4.02 |
CLIPSeg | ViT-B/16 | VA+L+F+D | 38.96 | 29.65 | 33.67 | 24.80 | 6.24 | 9.98 |
Cross-Dataset performance of models trained on the source dataset ADE20K-Part-234 and tested on the target dataset Pascal-Part-116. | Model | Source | Target | ||
---|---|---|---|---|---|
Oracle-Obj | Pred-Obj | Oracle-Obj | Pred-Obj | ||
CATSeg | 27.95 | 17.22 | 16.00 | 14.72 | |
CLIPSeg VA+L+F | 35.01 | 21.74 | 16.18 | 11.70 | |
CLIPSeg VA+L+F+D | 37.76 | 21.87 | 19.69 | 13.88 |
If you find our work helpful, please cite:
@inproceedings{wei2023ov,
title={OV-PARTS: Towards Open-Vocabulary Part Segmentation},
author={Wei, Meng and Yue, Xiaoyu and Zhang, Wenwei and Kong, Shu and Liu, Xihui and Pang, Jiangmiao},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2023}
}
We would like to express our gratitude to the open-source projects and their contributors, including ZSSeg, CATSeg and CLIPSeg. Their valuable work has greatly contributed to the development of our codebase.