Object-Aware Distillation Pyramid

     _/_/      _/_/    _/_/_/    _/_/_/
  _/    _/  _/    _/  _/    _/  _/    _/
 _/    _/  _/_/_/_/  _/    _/  _/_/_/
_/    _/  _/    _/  _/    _/  _/
 _/_/    _/    _/  _/_/_/    _/

This repository is the official implementation of "Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection".

Installation

Create a conda environment and activate it.

conda create -n oadp python=3.10
conda activate oadp

Install PyTorch following the official documentation. For example,

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113

Install MMDetection following the official instructions. For example,

pip install openmim
mim install mmcv_full==1.7.0
pip install mmdet==2.25.2

Install other dependencies.

pip install todd_ai==0.3.0
pip install git+https://github.com/LutingWang/CLIP.git
pip install git+https://github.com/lvis-dataset/lvis-api.git@lvis_challenge_2021
pip install nni scikit-learn==1.1.3

Preparation

Datasets

Download the MS-COCO dataset to data/coco.

OADP/data/coco
├── annotations
│   ├── instances_train2017.json
│   └── instances_val2017.json
├── train2017
│   └── ...
└── val2017
    └── ...

Download the LVIS v1.0 dataset to data/lvis_v1.

OADP/data/lvis_v1
├── annotations
│   ├── lvis_v1_train.json
│   └── lvis_v1_val.json
├── train2017 -> ../coco/train2017
│   └── ...
└── val2017 -> ../coco/train2017
    └── ...

Annotations

python -m oadp.build_annotations

The following files will be generated

OADP/data
├── coco
│   └── annotations
│       ├── instances_train2017.48.json
│       ├── instances_train2017.65.json
│       ├── instances_val2017.48.json
│       ├── instances_val2017.65.json
│       └── instances_val2017.65.min.json
└── lvis_v1
    └── annotations
        ├── lvis_v1_train.1203.json
        ├── lvis_v1_train.866.json
        ├── lvis_v1_val.1203.json
        └── lvis_v1_val.866.json

Pretrained Models

Download the CLIP model.

python -c "import clip; clip.load_default()"

Download the ResNet50 model.

mkdir pretrained
python -c "import torchvision; _ = torchvision.models.ResNet50_Weights.IMAGENET1K_V1.get_state_dict(True)"
ln -s ~/.cache/torch/hub/checkpoints/ pretrained/torchvision

Download and rename soco_star_mask_rcnn_r50_fpn_400e.pth from Baidu Netdisk or Google Drive.

Download the DetPro prompt from Baidu Netdisk.

Organize the pretrained models as follows

OADP/pretrained
├── clip
│   └── ViT-B-32.pt
├── detpro
│   └── iou_neg5_ens.pth
├── torchvision
│   └── resnet50-0676ba61.pth
└── soco
    └── soco_star_mask_rcnn_r50_fpn_400e.pth

Prompts

Generate the ViLD prompts.

python -m oadp.prompts.vild

Download ml_coco.pth from Baidu Netdisk.

Generate the DetPro prompts.

python -m oadp.prompts.detpro

Organize the prompts as follows

OADP/data/prompts
├── vild.pth
└── ml_coco.pth

Proposals

Download the proposals from Baidu Netdisk.

Organize the proposals as follows

OADP/data
├── coco
│   └── proposals
│       ├── rpn_r101_fpn_coco_train.pkl
│       ├── rpn_r101_fpn_coco_val.pkl
│       ├── oln_r50_fpn_coco_train.pkl
│       └── oln_r50_fpn_coco_val.pkl
└── lvis_v1
    └── proposals
        ├── oln_r50_fpn_lvis_train.pkl
        └── oln_r50_fpn_lvis_val.pkl

OADP

Most commands listed in this section supports the DRY_RUN mode. When the DRY_RUN environment variable is set to True, the command that follows will not execute the time-consuming parts. This functionality is intended for quick integrity check.

Most commands run on both CPU and GPU servers. For CPU, use the python command. For GPU, use the torchrun command. Do not use python on GPU servers, since the command will attempt to initialize distributed training.

For all commands listed in this section, [...] means optional parts and (...|...) means choices. For example,

[DRY_RUN=True] (python|torchrun --nproc_per_node=${GPUS})

is equivalent to the following four possible commands

DRY_RUN=True torchrun --nproc_per_node=${GPUS}  # GPU under the DRY_RUN mode
DRY_RUN=True python                             # CPU under the DRY_RUN mode
torchrun --nproc_per_node=${GPUS}               # GPU
python                                          # CPU

OAKE

The following scripts extract features with CLIP, which can be very time-consuming. Therefore, all the scripts support automatically resuming, by skipping existing feature files. However, the existing feature files are sometimes broken. In such cases, users can set the auto_fix option to inspect the integrity of each feature file.

Extract globals and blocks features, which can be used for both coco and lvis

[DRY_RUN=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.oake.globals oake/globals configs/oake/globals.py [--override .train.dataloader.dataset.auto_fix:True .val.dataloader.dataset.auto_fix:True]
[DRY_RUN=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.oake.blocks oake/blocks configs/oake/blocks.py [--override .train.dataloader.dataset.auto_fix:True .val.dataloader.dataset.auto_fix:True]

Extract objects features for coco

[DRY_RUN=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.oake.objects oake/objects configs/oake/objects_coco.py [--override .train.dataloader.dataset.auto_fix:True .val.dataloader.dataset.auto_fix:True]

Extract objects features for lvis

[DRY_RUN=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.oake.objects oake/objects configs/oake/objects_lvis.py [--override .train.dataloader.dataset.auto_fix:True .val.dataloader.dataset.auto_fix:True]

Feature extraction can be very time consuming. Therefore, we provide archives of the extracted features on Baidu Netdisk. The extracted features are archived with the following command

cd data/coco/oake/

tar -zcf globals.tar.gz globals
tar -zcf blocks.tar.gz blocks
tar -zcf objects.tar.gz objects/val2017

cd objects/train2017
ls > objects
split -d -3000 - objects. < objects
for i in objects.[0-9][0-9]; do
    zip -q -9 "$i.zip" -@ < "$i"
    mv "$i.zip" ../..
done
rm objects*

The final directory for OAKE should look like

OADP/data
├── coco
│   └── oake
│       ├── blocks
│       │   ├── train2017
│       │   └── val2017
│       ├── globals
│       │   ├── train2017
│       │   └── val2017
│       └── objects
│           ├── train2017
│           └── val2017
└── lvis_v1
    └── oake
        ├── blocks -> ../coco/oake/blocks
        ├── globals -> ../coco/oake/globals
        └── objects
            ├── train2017
            └── val2017

DP

To conduct training for coco

[DRY_RUN=True] [TRAIN_WITH_VAL_DATASET=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.dp.train vild_ov_coco configs/dp/vild_ov_coco.py [--override .validator.dataloader.dataset.ann_file::data/coco/annotations/instances_val2017.48.json]
[DRY_RUN=True] [TRAIN_WITH_VAL_DATASET=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.dp.train oadp_ov_coco configs/dp/oadp_ov_coco.py [--override .validator.dataloader.dataset.ann_file::data/coco/annotations/instances_val2017.48.json]

To conduct training for lvis

[DRY_RUN=True] [TRAIN_WITH_VAL_DATASET=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.dp.train oadp_ov_lvis configs/dp/oadp_ov_lvis.py

To test a specific checkpoint

[DRY_RUN=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.dp.test configs/dp/oadp_ov_coco.py work_dirs/oadp_ov_coco/iter_32000.pth
[DRY_RUN=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.dp.test configs/dp/oadp_ov_lvis.py work_dirs/oadp_ov_lvis/epoch_24.pth

For the instance segmentation performance on LVIS, use the metrics argument

[DRY_RUN=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.dp.test configs/dp/oadp_ov_lvis.py work_dirs/oadp_ov_lvis/epoch_24.pth --metrics bbox segm

NNI is supported but unnecessary.

DUMP=work_dirs/dump (python|torchrun --nproc_per_node=${GPUS}) -m oadp.dp.test configs/dp/oadp_ov_coco.py work_dirs/oadp_ov_coco/iter_32000.pth
DUMP=work_dirs/dump python tools/nni_dp_test.py

Results

The checkpoints for OADP are available on Baidu Netdisk.

OV COCO

mAPN50	Config	Checkpoint
$31.3$	oadp_ov_coco.py	work_dirs/oadp_ov_coco/iter_32000.pth

OV LVIS

OD APr	IS APr	Config	Checkpoint
$20.6$	$19.9$	oadp_ov_lvis.py	work_dirs/oadp_ov_lvis/epoch_24.pth
		oadp_ov_lvis_lsj.py	Coming soon

LutingWang / OADP

readme