WarlockWendell / AggDet

official implementation of Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation
8 stars 2 forks source link

AggDet

This repo is the implementation of Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation

Abstract

Open-vocabulary object detection (OVOD) aims at localizing and recognizing visual objects from novel classes unseen at the training time. Whereas, empirical studies reveal that advanced detectors generally assign lower scores to those novel instances, which are inadvertently suppressed during inference by commonly adopted greedy strategies like Non-Maximum Suppression (NMS), leading to sub-optimal detection performance for novel classes. This paper systematically investigates this problem with the commonly-adopted two-stage OVOD paradigm. Specifically, in the region-proposal stage, proposals that contain novel instances showcase lower objectness scores, since they are treated as background proposals during the training phase. Meanwhile, in the object-classification stage, novel objects share lower region-text similarities (i.e., classification scores) due to the biased visual-language alignment by seen training samples. To alleviate this problem, this paper introduces two advanced measures to adjust confidence scores and conserve erroneously dismissed objects: (1) a class-agnostic localization quality estimate via overlap degree of region/object proposals, and (2) a text-guided visual similarity estimate with proxy prototypes for novel classes. Integrated with adjusting techniques specifically designed for the region-proposal and object-classification stages, this paper derives the aggregated confidence estimate for the open-vocabulary object detection paradigm AggDet.

framewroks

Preparations

Inference

Take Detic with a ResNet50 backbone on the OV-COCO dataset as an example.

python train_net.py --eval-only --config-file configs/Detic_RN50_COCO.yaml

You can modify the following parameters in the yaml file to adjust the parameters described in the paper.

OVERLAP_TOPK: 3
ALPHA: 0.05
BETA: 0.75

For example, use the following command to test the baseline model:

python train_net.py --eval-only --config-file configs/Detic_RN50_COCO.yaml  \
MODEL.OVERLAP_TOPK=0 MODEL.ALPHA 0.0 MODEL.BETA 0.0

You can change the config-file to change the model and dataset. Refer to REPRODUCE.md for more details.

Citation

@article{
  title={Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation},
  author={Yanhao Zheng, Kai Liu},
  journal={arXiv preprint arXiv:2404.08603},
  year={2024}
}

Acknowledgment

AggDet is built upon the awesome works Codet, EVA and Detic. Many thanks for their wonderful work.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.