_/_/ _/_/ _/_/_/ _/_/_/
_/ _/ _/ _/ _/ _/ _/ _/
_/ _/ _/_/_/_/ _/ _/ _/_/_/
_/ _/ _/ _/ _/ _/ _/
_/_/ _/ _/ _/_/_/ _/
This repository is the official implementation of "Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection".
Create a conda environment and activate it.
conda create -n oadp python=3.10
conda activate oadp
Install PyTorch
following the official documentation.
For example,
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
Install MMDetection
following the official instructions.
For example,
pip install openmim
mim install mmcv_full==1.7.0
pip install mmdet==2.25.2
Install other dependencies.
pip install todd_ai==0.3.0
pip install git+https://github.com/LutingWang/CLIP.git
pip install git+https://github.com/lvis-dataset/lvis-api.git@lvis_challenge_2021
pip install nni scikit-learn==1.1.3
Download the MS-COCO dataset to data/coco
.
OADP/data/coco
├── annotations
│ ├── instances_train2017.json
│ └── instances_val2017.json
├── train2017
│ └── ...
└── val2017
└── ...
Download the LVIS v1.0 dataset to data/lvis_v1
.
OADP/data/lvis_v1
├── annotations
│ ├── lvis_v1_train.json
│ └── lvis_v1_val.json
├── train2017 -> ../coco/train2017
│ └── ...
└── val2017 -> ../coco/train2017
└── ...
python -m oadp.build_annotations
The following files will be generated
OADP/data
├── coco
│ └── annotations
│ ├── instances_train2017.48.json
│ ├── instances_train2017.65.json
│ ├── instances_val2017.48.json
│ ├── instances_val2017.65.json
│ └── instances_val2017.65.min.json
└── lvis_v1
└── annotations
├── lvis_v1_train.1203.json
├── lvis_v1_train.866.json
├── lvis_v1_val.1203.json
└── lvis_v1_val.866.json
Download the CLIP model.
python -c "import clip; clip.load_default()"
Download the ResNet50 model.
mkdir pretrained
python -c "import torchvision; _ = torchvision.models.ResNet50_Weights.IMAGENET1K_V1.get_state_dict(True)"
ln -s ~/.cache/torch/hub/checkpoints/ pretrained/torchvision
Download and rename soco_star_mask_rcnn_r50_fpn_400e.pth
from Baidu Netdisk or Google Drive.
Download the DetPro prompt from Baidu Netdisk.
Organize the pretrained models as follows
OADP/pretrained
├── clip
│ └── ViT-B-32.pt
├── detpro
│ └── iou_neg5_ens.pth
├── torchvision
│ └── resnet50-0676ba61.pth
└── soco
└── soco_star_mask_rcnn_r50_fpn_400e.pth
Generate the ViLD prompts.
python -m oadp.prompts.vild
Download ml_coco.pth
from Baidu Netdisk.
Generate the DetPro prompts.
python -m oadp.prompts.detpro
Organize the prompts as follows
OADP/data/prompts
├── vild.pth
└── ml_coco.pth
Download the proposals from Baidu Netdisk.
Organize the proposals as follows
OADP/data
├── coco
│ └── proposals
│ ├── rpn_r101_fpn_coco_train.pkl
│ ├── rpn_r101_fpn_coco_val.pkl
│ ├── oln_r50_fpn_coco_train.pkl
│ └── oln_r50_fpn_coco_val.pkl
└── lvis_v1
└── proposals
├── oln_r50_fpn_lvis_train.pkl
└── oln_r50_fpn_lvis_val.pkl
Most commands listed in this section supports the DRY_RUN
mode.
When the DRY_RUN
environment variable is set to True
, the command that follows will not execute the time-consuming parts.
This functionality is intended for quick integrity check.
Most commands run on both CPU and GPU servers.
For CPU, use the python
command.
For GPU, use the torchrun
command.
Do not use python
on GPU servers, since the command will attempt to initialize distributed training.
For all commands listed in this section, [...]
means optional parts and (...|...)
means choices.
For example,
[DRY_RUN=True] (python|torchrun --nproc_per_node=${GPUS})
is equivalent to the following four possible commands
DRY_RUN=True torchrun --nproc_per_node=${GPUS} # GPU under the DRY_RUN mode
DRY_RUN=True python # CPU under the DRY_RUN mode
torchrun --nproc_per_node=${GPUS} # GPU
python # CPU
The following scripts extract features with CLIP, which can be very time-consuming. Therefore, all the scripts support automatically resuming, by skipping existing feature files. However, the existing feature files are sometimes broken. In such cases, users can set the auto_fix
option to inspect the integrity of each feature file.
Extract globals and blocks features, which can be used for both coco and lvis
[DRY_RUN=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.oake.globals oake/globals configs/oake/globals.py [--override .train.dataloader.dataset.auto_fix:True .val.dataloader.dataset.auto_fix:True]
[DRY_RUN=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.oake.blocks oake/blocks configs/oake/blocks.py [--override .train.dataloader.dataset.auto_fix:True .val.dataloader.dataset.auto_fix:True]
Extract objects features for coco
[DRY_RUN=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.oake.objects oake/objects configs/oake/objects_coco.py [--override .train.dataloader.dataset.auto_fix:True .val.dataloader.dataset.auto_fix:True]
Extract objects features for lvis
[DRY_RUN=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.oake.objects oake/objects configs/oake/objects_lvis.py [--override .train.dataloader.dataset.auto_fix:True .val.dataloader.dataset.auto_fix:True]
Feature extraction can be very time consuming. Therefore, we provide archives of the extracted features on Baidu Netdisk. The extracted features are archived with the following command
cd data/coco/oake/
tar -zcf globals.tar.gz globals
tar -zcf blocks.tar.gz blocks
tar -zcf objects.tar.gz objects/val2017
cd objects/train2017
ls > objects
split -d -3000 - objects. < objects
for i in objects.[0-9][0-9]; do
zip -q -9 "$i.zip" -@ < "$i"
mv "$i.zip" ../..
done
rm objects*
The final directory for OAKE should look like
OADP/data
├── coco
│ └── oake
│ ├── blocks
│ │ ├── train2017
│ │ └── val2017
│ ├── globals
│ │ ├── train2017
│ │ └── val2017
│ └── objects
│ ├── train2017
│ └── val2017
└── lvis_v1
└── oake
├── blocks -> ../coco/oake/blocks
├── globals -> ../coco/oake/globals
└── objects
├── train2017
└── val2017
To conduct training for coco
[DRY_RUN=True] [TRAIN_WITH_VAL_DATASET=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.dp.train vild_ov_coco configs/dp/vild_ov_coco.py [--override .validator.dataloader.dataset.ann_file::data/coco/annotations/instances_val2017.48.json]
[DRY_RUN=True] [TRAIN_WITH_VAL_DATASET=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.dp.train oadp_ov_coco configs/dp/oadp_ov_coco.py [--override .validator.dataloader.dataset.ann_file::data/coco/annotations/instances_val2017.48.json]
To conduct training for lvis
[DRY_RUN=True] [TRAIN_WITH_VAL_DATASET=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.dp.train oadp_ov_lvis configs/dp/oadp_ov_lvis.py
To test a specific checkpoint
[DRY_RUN=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.dp.test configs/dp/oadp_ov_coco.py work_dirs/oadp_ov_coco/iter_32000.pth
[DRY_RUN=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.dp.test configs/dp/oadp_ov_lvis.py work_dirs/oadp_ov_lvis/epoch_24.pth
For the instance segmentation performance on LVIS, use the metrics
argument
[DRY_RUN=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.dp.test configs/dp/oadp_ov_lvis.py work_dirs/oadp_ov_lvis/epoch_24.pth --metrics bbox segm
NNI is supported but unnecessary.
DUMP=work_dirs/dump (python|torchrun --nproc_per_node=${GPUS}) -m oadp.dp.test configs/dp/oadp_ov_coco.py work_dirs/oadp_ov_coco/iter_32000.pth
DUMP=work_dirs/dump python tools/nni_dp_test.py
The checkpoints for OADP are available on Baidu Netdisk.
mAPN50 | Config | Checkpoint |
---|---|---|
$31.3$ | oadp_ov_coco.py | work_dirs/oadp_ov_coco/iter_32000.pth |
OD APr | IS APr | Config | Checkpoint |
---|---|---|---|
$20.6$ | $19.9$ | oadp_ov_lvis.py | work_dirs/oadp_ov_lvis/epoch_24.pth |
oadp_ov_lvis_lsj.py | Coming soon |