AggDet

This repo is the implementation of Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation

Abstract

Open-vocabulary object detection (OVOD) aims at localizing and recognizing visual objects from novel classes unseen at the training time. Whereas, empirical studies reveal that advanced detectors generally assign lower scores to those novel instances, which are inadvertently suppressed during inference by commonly adopted greedy strategies like Non-Maximum Suppression (NMS), leading to sub-optimal detection performance for novel classes. This paper systematically investigates this problem with the commonly-adopted two-stage OVOD paradigm. Specifically, in the region-proposal stage, proposals that contain novel instances showcase lower objectness scores, since they are treated as background proposals during the training phase. Meanwhile, in the object-classification stage, novel objects share lower region-text similarities (i.e., classification scores) due to the biased visual-language alignment by seen training samples. To alleviate this problem, this paper introduces two advanced measures to adjust confidence scores and conserve erroneously dismissed objects: (1) a class-agnostic localization quality estimate via overlap degree of region/object proposals, and (2) a text-guided visual similarity estimate with proxy prototypes for novel classes. Integrated with adjusting techniques specifically designed for the region-proposal and object-classification stages, this paper derives the aggregated confidence estimate for the open-vocabulary object detection paradigm AggDet.

framewroks

Preparations

Installation

Following the Installation instructions of CoDet to setup environment.

Setup environment

conda create --name aggdet python=3.8 -y && conda activate aggdet
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
git clone https://github.com/WarlockWendell/AggDet.git

Install Apex and xFormer (You can skip this part if you do not use EVA-02 backbone)

pip install ninja
pip install -v -U git+https://github.com/facebookresearch/xformers.git@7e05e2caaaf8060c1c6baadc2b04db02d5458a94
git clone https://github.com/NVIDIA/apex && cd apex
pip install packaging
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./ && cd ..

Install detectron2 and other dependencies

cd AggDet/third_party/detectron2
pip install -e .
cd ../..
pip install -r requirements.txt

Datasets

Please refer to DATA.md for more details.
Pretrained weights

You can download the pre-trained weights from the official GitHub repos of Detic and CoDet, and put them under the <AGGDET_ROOT>/ckpt/models directory.

model dataset download

Detic_RN50 COCO model

CoDet_RN50 COCO model

Detic_SwinB LVIS model

CoDet_RN50 LVIS model

CoDet_SwinB LVIS model

CoDet_EVA02 LVIS model

model	dataset	download
Detic_RN50	COCO	model
CoDet_RN50	COCO	model
Detic_SwinB	LVIS	model
CoDet_RN50	LVIS	model
CoDet_SwinB	LVIS	model
CoDet_EVA02	LVIS	model

Inference

Take Detic with a ResNet50 backbone on the OV-COCO dataset as an example.

python train_net.py --eval-only --config-file configs/Detic_RN50_COCO.yaml

You can modify the following parameters in the yaml file to adjust the parameters described in the paper.

OVERLAP_TOPK: 3
ALPHA: 0.05
BETA: 0.75

For example, use the following command to test the baseline model:

python train_net.py --eval-only --config-file configs/Detic_RN50_COCO.yaml  \
MODEL.OVERLAP_TOPK=0 MODEL.ALPHA 0.0 MODEL.BETA 0.0

You can change the config-file to change the model and dataset. Refer to REPRODUCE.md for more details.

Citation

@article{
  title={Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation},
  author={Yanhao Zheng, Kai Liu},
  journal={arXiv preprint arXiv:2404.08603},
  year={2024}
}

Acknowledgment

AggDet is built upon the awesome works Codet, EVA and Detic. Many thanks for their wonderful work.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

WarlockWendell / AggDet