HIPIE: Hierarchical Open-vocabulary Universal Image Segmentation

We present HIPIE, a novel HIerarchical, oPen-vocabulary and unIvErsal image segmentation and detection model that is capable of performing segmentation tasks at various levels of granularities (whole, part and subpart) and tasks, including semantic segmentation, instance segmentation, panoptic segmentation, referring segmentation, and part/subpart segmentation, all within a unified framework of language-guided segmentation.

Hierarchical Open-vocabulary Universal Image Segmentation
Xudong Wang*, Shufan Li*, Konstantinos Kallidromitis*, Yusuke Kato, Kazuki Kozuka, Trevor Darrell
Berkeley AI Research, UC Berkeley; Panasonic AI Research
NeurIPS 2023

[project page] [arxiv] [paper] [bibtex]

Oct 5: We release more weights, and codes for training and evaluation

Oct 15: We release additional Vit-H weights finetuned for part segmentation

Installation

Please refer to INSTALL.md for more details.

Demos

See Demo-Main for Panoptic, Part, Instance and Referring Segmentation
See Demo-SD for Combining our model with Stable Diffusion
See Demo-SAM for Combining our model with Segment Anything

HIPIE is also capable of labeling segmentation masks from SAM and can even identify additional masks that may have been overlooked by SAM.

Please check our project page for more demos!

Model Zoo

We release the following checkpoints at the moment.

ResNet-50 Pretrained with O365,COCO,RefCOCO,Pascal Panoptic Parts
ViT-H Pretrained with O365 and finetuned on COCO,RefCOCO
ViT-H Pretrained with O365,COCO,RefCOCO,PACO
ViT-H Finetuned on COCO,RefCOCO,Pascal-Parts

Training

The following code will train model on one node with 8 A100 GPUS

python3 launch.py --nn 1 --np 8 --uni 1 --config-file projects/HIPIE/configs/<config file>  MODEL.WEIGHTS <pretrained checkpoint>

Evaluation

The following code will evaluate model on one node with 8 A100 GPUS

python3 launch.py --nn 1 --np 8 --uni 1 --config-file projects/HIPIE/configs/<config file> --eval-only  MODEL.WEIGHTS < checkpoint to load>

Alternatively, one can run

python3 launch.py --nn 1 --np 8 --uni 1 --config-file projects/HIPIE/configs/<config file> --eval-only --resume OUTPUT_DIR < folder with checkpoints >

with released weights, on should be able to reproduce following results

Data	COCO				ADE-150				RefCOCO	RefCOCO+	RefCOCOg	PAS-21	CTX-59	CTX-459	ADE-874
	AP_bbox	AP_segm	MIoU	PQ	AP_bbox	AP_segm	MIoU	PQ	oIoU	oIoU	oIoU	MIoU	MIoU	MIoU	MIoU
O365, COCO, RefCOCO/+/g,PACO	60.4	51.1	65.6	57.0	23.0	19.1	24.3	21.0	81.5	71.5	74.3	81.1	57.4	14.4	9.7
O365*, COCO, RefCOCO/+/g	61.3	51.9	66.8	58.1	18.4	14.9	28.4	20.1	82.8	73.9	75.7	83.2	58.1	11.1	10.8

* Used only in pretraing, but not in final training.

** Note on high variance: We observe that evaluation metrics can have high variances, this is likely due to the noise of using CLIP MODEL. Specifically, changing the MODEL.CLIP.ALPHA and MODEL.CLIP.BETA which determines the importances of CLIP feature versus encoder feature can drastically change the results. It is possible to improve on individual benchmark by tuning these parameters.

The finetuned part segmentation model should be able to produce the following result

COCO	RefCOCO	Pascal-Parts
PQ	oIoU	MIoU-PastS
55.3	78.1	64.4

License

The majority of HIPIE is licensed under the MIT license. If you later add other third party code, please keep this license info updated, and please let us know if that component is licensed under something other than CC-BY-NC, MIT, or CC0.

How to get support from us?

If you have any general questions, feel free to email us at Xudong Wang, Shufan Li and Konstantinos Kallidromitis. If you have code or implementation-related questions, please feel free to send emails to us or open an issue in this codebase (We recommend that you open an issue in this codebase, because your questions may help others).

Citation

If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation.

@inproceedings{wang2023hierarchical,
  title={Hierarchical Open-vocabulary Universal Image Segmentation},
  author={Wang, Xudong and Li, Shufan and Kallidromitis, Konstantinos and Kato, Yusuke and Kozuka, Kazuki and Darrell, Trevor},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
  year={2023}
}

berkeley-hipie / HIPIE

readme