GT-Wei / OVA-DETR

19 stars 1 forks source link

OVA-DETR: Open Vocabulary Aerial Object Detection Using Image-Text Alignment and Fusion

Guoting Wei1,4,*, Xia Yuan1,*, Yu Liu3,📧, Zhenhao Shang2, Kelu Yao3, Chao Li3 Qingsen Yan2,
Chunxia Zhao1, Haokui Zhang2,4,📧, Rong Xiao4

* Equal contribution 📧 Corresponding author

1 Nanjing University of Science and Technology, 3 Zhejiang Lab
2 Northwestern Polytechnical University, 4 Intellifusion

Partial results

description

Figure 1: Compared OVA-DETR with recently advanced open-vocabulary detectors in terms of speed and recall. All methods are evaluated on DIOR dataset under zero shot detection. The inference speeds were measured on a 3090 GPU by default, except that DescReg was measured on a 4090 GPU


description

Figure 2: Overall architecture of OVA-DETR.The improvements of OVA-DETR can be summarized into two main components: the Image-Text Alignment and the Bidirectional Vision-Language Fusion.


description

Figure 5:Qualitative results for zero-shot detection on the xView,DIOR,and DOTA datasets, focusing on novel classes.The green rectangles represent predicted bounding boxes, while red rectangles denote ground truth bounding boxes.


description

description


description


description


description


description


description

Installation

  1. Clone the OVA-DETR repository.
    git clone https://github.com/GT-Wei/OVA-DETR.git
  2. Clone the mmdetection repository (include RT-DETR cfw)
    git clone https://github.com/flytocc/mmdetection.git
    cp -r OVA-DETR/* ./mmdetection/
  3. OVA-DETR is developed based on torch==1.11.0+cu11.3 and mmdetection==3.3.0
    
    conda create -n OVA-DETR python==3.8 -y
    conda activate OVA-DETR

conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch

pip install -U openmim mim install mmengine mim install "mmcv==2.0.0" pip install transformers open_clip_torch pip install git+https://github.com/openai/CLIP.git

cd mmdetection pip install -v -e .

mkdir pretrain_model wget https://github.com/flytocc/mmdetection/releases/download/model_zoo/rtdetr_r50vd_8xb2-72e_coco_ff87da1a.pth wget https://github.com/GT-Wei/OVA-DETR/releases/download/v1.0.0/epoch_30.pth wget https://github.com/GT-Wei/OVA-DETR/releases/download/v1.0.0/epoch_45.pth

4. Training

eg: CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh configs/OVA_DETR/OVA_DETR_4xb4-80e_dior_dota_xview.py 4

5. Evaluation

eg: CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_test.sh configs/OVA_DETR/OVA_DETR_4xb4-80e_dior_dota_xview.py ./pretrain_model/epoch30.pt 4


## Acknowledgement

We are grateful to the contributors for their crucial integration of RT-DETR into the mmdetection framework. We implemented OVA-DETR based on their shared resources available at [mmdetection](https://github.com/flytocc/mmdetection).

@article{wei2024ova, title={OVA-DETR: Open Vocabulary Aerial Object Detection Using Image-Text Alignment and Fusion}, author={Wei, Guoting and Yuan, Xia and Liu, Yu and Shang, Zhenhao and Yao, Kelu and Li, Chao and Yan, Qingsen and Zhao, Chunxia and Zhang, Haokui and Xiao, Rong}, journal={arXiv preprint arXiv:2408.12246}, year={2024} }