CVMI-Lab / SlotCon

(NeurIPS 2022) Self-Supervised Visual Representation Learning with Semantic Grouping
https://wen-xin.info/slotcon/
Apache License 2.0
95 stars 9 forks source link
contrastive-learning neurips-2022 object-discovery pre-training self-supervised-learning slotcon

Self-Supervised Visual Representation Learning with Semantic Grouping

Self-Supervised Visual Representation Learning with Semantic Grouping (NeurIPS 2022)
By Xin Wen, Bingchen Zhao, Anlin Zheng, Xiangyu Zhang, and Xiaojuan Qi.

Introduction

We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning. The semantic grouping is performed by assigning pixels to a set of learnable prototypes, which can adapt to each sample by attentive pooling over the feature and form new slots. Based on the learned data-dependent slots, a contrastive objective is employed for representation learning, which enhances the discriminability of features, and conversely facilitates grouping semantically coherent pixels together.

framework

Compared with previous efforts, by simultaneously optimizing the two coupled objectives of semantic grouping and contrastive learning, our approach bypasses the disadvantages of hand-crafted priors and is able to learn object/group-level representations from scene-centric images. Experiments show our approach effectively decomposes complex scenes into semantic groups for feature learning and significantly benefits downstream tasks, including object detection, instance segmentation, and semantic segmentation.

Pretrained models

Method Dataset Epochs Arch APb APm Download
SlotCon COCO 800 ResNet-50 41.0 37.0 script | backbone only | full ckpt
SlotCon COCO+ 800 ResNet-50 41.8 37.8 script | backbone only | full ckpt
SlotCon ImageNet-1K 100 ResNet-50 41.4 37.2 script | backbone only | full ckpt
SlotCon ImageNet-1K 200 ResNet-50 41.8 37.8 script | backbone only | full ckpt

Folder containing all the checkpoints: [link].

Getting started

Requirements

This project is developed with python==3.9 and pytorch==1.10.0, please be aware of possible code compatibility issues if you are using another version.

The following is an example of setting up the experimental environment:

Run pre-training

By default, we train with DDP over 8 GPUs on a single machine. The following are some examples of re-implementing SlotCon pre-training on COCO and ImageNet:

Evaluation: Object Detection & Instance Segmentation

Please install detectron2 and prepare the dataset first following the official instructions: [installation] [data preparation]

The following is an example usage of evaluating a pre-trained model on COCO:

Evaluation: Semantic Segmentation

Please install mmsegmentation and prepare the datasets first following the official instructions: [installation] [data preparation]

# run cityscapes
cd transfer/segmentation &&
bash mim_dist_train.sh configs/cityscapes/fcn_d6_r50-d16_769x769_90k_cityscapes_moco.py ../../${EXP_NAME}.pth 2
# run ade20k
cd transfer/segmentation &&
bash mim_dist_train.sh configs/ade20k/fcn_r50-d8_512x512_80k_ade20k.py ../../${EXP_NAME}.pth 4

Prototype Visualization

We also provide the code for visualizing the learned prototypes' nearest neighbors. To run the following command, please prepare a full checkpoint.

python viz_slots.py \
    --data_dir ${PATH_TO_COCO} \
    --model_path ${PATH_TO_MODEL} \
    --save_path ${PATH_TO_SAVE} \
    --topk 5 \ # retrieve 5 nearest-neighbors for each prototype
    --sampling 20 # randomly sample 20 prototypes to visualize

concepts

Citing this work

If you find this repo useful for your research, please consider citing our paper:

@inproceedings{wen2022slotcon,
  title={Self-Supervised Visual Representation Learning with Semantic Grouping},
  author={Wen, Xin and Zhao, Bingchen and Zheng, Anlin and Zhang, Xiangyu and Qi, Xiaojuan},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022}
}

Acknowledgment

Our codebase builds upon several existing publicly available codes. Specifically, we have modified and integrated the following repos into this project:

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.