Pytorch implementation for "Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity" ([CVPR 2022, link TBD]()) by Weiyao Wang, [Matt Feiszli](), Heng Wang, Jitendra Malik, and Du Tran. We propose a framework for open-world instance segmentation, Generic Grouping Network (GGN), which exploits pseudo Ground Truth training strategy. On the same backbone, GGN produces impressive AR gains compared to closed-world training on cross-category generalization (+11% VOC to Non-VOC) and cross-dataset generalization (+5.2% COCO to UVO).
What is it? Open-world instance segmentation requires a model to group pixels into object instances without a pre-defined taxonomy, that is, both "seen" categories (those present during training) and "unseen" categories (not seen during training). There is generally a large performance gap between the seen and unseen domains. For example, a baseline Mask R-CNN miss 15 annotated masks in the example below. Without additional training data or annotations, Mask R-CNN trained with GGN framework produces 9 more segments correctly, being much closer to ground truth annotations.
How we do it? Our approach first learns a pairwise affinity predictor that captures correctly if two pixels belong to same instance or not. We demonstrate such pairwise affinity representation generalizes well to unseen domains. We then use a grouping module (e.g. MCG) to extract and rank segments from predicted PA. We can run this on any image dataset without using annotations; we extract highest ranked segments as "pseudo ground truth" candidate masks. This is a large and category-agnostic set; we add it to our (much smaller) datasets of curated annotations to train a detector.
About the code. This repo is built based on mmdetection with the addition of OLN backbone (concurrent work). The repo is tested under Python 3.7, PyTorch 1.7.0, Cuda 11.0, and mmcv==1.2.5. We thank authors of OLN for releasing their work to facilitate research.
Below we release PA predictor models, pseudo-GT generated by PA predictors and GGN trained with both annotated-GT and pseudo-GT. We also release some of the processed annotations from LVIS to conduct cross-category generalization experiments.
Training | Eval | url | Baseline AR | GGN AR | Top-K Pseudo |
---|---|---|---|---|---|
Person, COCO | Non-Person, COCO | PA/Pseudo/GGN | 4.9 | 20.9 | 3 |
VOC, COCO | Non-VOC, COCO | PA/Pseudo/Pseudo-OLN/ GGN/GGN-OLN | 19.9 | 28.7 (33.7 with OLN) | 3 |
COCO, LVIS | Non-COCO, LVIS | PA/Pseudo/GGN | 16.5 | 20.4 | 1 |
Non-COCO, LVIS | COCO | PA/Pseudo/GGN | 21.7 | 23.6 | 1 |
COCO | UVO | PA/Pseudo/GGN | 40.1 | 43.4 | 3 |
COCO, random init | ImageNet | PA/Pseudo/GGN | 10 |
We remark using large-scale pre-training in the last row as initialization and finetune GGN on COCO with pseudo-GT on COCO gives further improvement (45.3 on UVO), with model.
This repo is built based on mmdetection.
You can use following commands to create conda env with related dependencies.
conda create -n ggn python=3.7 -y
conda activate ggn
conda install pytorch=1.7.0 torchvision cudatoolkit=11.0 -c pytorch -y
pip install mmcv-full
pip install -r requirements.txt
pip install -v -e .
Please also refer to get_started.md for more details of installation.
Next you will need to build the library for our grouping module:
cd pa_lib/cython_lib
python3 setup.py build_ext --inplace
Download and extract COCO 2017 train and val images with annotations from http://cocodataset.org. We expect the directory structure to be the following:
path/to/coco/
annotations/ # annotation json files
train2017/ # train images
val2017/ # val images
Our work also uses LVIS, UVO and ADE20K. To use ADE20K, please convert them into COCO-style annotations.
bash tools/dist_train.sh configs/pairwise_affinity/pa_train.py ${NUM_GPUS} --work-dir ${WORK_DIR}
We provide a tool tools/test_pa.py
to directly evaluate PA performance (e.g. on PA prediction and on grouped masks).
python tools/test_pa.py configs/pairwise_affinity/pa_train.py ${WORK_DIR}/latest.pth --eval pa --eval-proposals --test-partition nonvoc
We first begin by extracting masks. Example config pa_extract.py
extracts pseudo-GT masks from PA trained on VOC subsets of COCO. use-gt-masks
flag asks the pipeline to compute maximum IoU an extracted masks has with the GT. It is recommended to split the dataset into multiple shards to run extractions. On original image resolution and Nvidia V100 machine, it takes about 4.8s per image to run the full pipeline (compute PA, run grouping, ranking then compute IoU with annotated GT) without globalization and trained ranker or 10s with globalization and trained ranker.
python tools/extract_pa_masks.py configs/pairwise_affinity/pa_extract.py ${PA_MODEL_PATH} --out ${OUT_DIR}/masks.json --use-gt-masks 1
The extracted masks will be stored in JSON with the following format
[
[segm1, segm2,..., segm20] ## Result of an image
...
]
We refer to tools/merge_annotations.py
for reference on formatting the extracted masks as a new COCO-style annotation file. We remark that tools/interpolate_extracted_masks.py
may be necessary if not running extraction on original image resolution.
Please specify additional_ann_file
with the extracted pseudo-GT in previous step in class_agn_mask_rcnn_pa.py
.
bash tools/dist_train.sh configs/mask_rcnn/class_agn_mask_rcnn_pa.py ${NUM_GPUS}
class_agn_mask_rcnn_gn_online.py
is used to train ImageNet extracted masks since there are too many annotations and we cannot store everything in a single json file without OOM. We will need to break it into per-image annotations in the format of "{image_id}.json".
python tools/test.py configs/mask_rcnn/class_agn_mask_rcnn.py ${WORK_DIR}/latest.pth --eval segm
@article{wang2022ggn,
title={Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity},
author={Wang, Weiyao and Feiszli, Matt and Wang, Heng and Malik, Jitendra and Tran, Du},
journal={CVPR},
year={2022}
}
This project is under the CC-BY-NC 4.0 license. See LICENSE for details.