ImNotPrepared / AirShot

BSD 2-Clause "Simplified" License
4 stars 0 forks source link

AirShot: Efficient Few-Shot Detection for Autonomous Exploration (IROS 2024)

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

Zihan Wang, Bowen Li, Chen Wang, and Sebastian Scherer*

Abstract

Few-shot object detection has drawn increasing attention in the field of robotic exploration, where robots are required to find unseen objects with a few online provided examples. Despite recent efforts have been made to yield online processing capabilities, slow inference speeds of low-powered robots fail to meet the demands of real-time detection-making them impractical for autonomous exploration. Existing methods still face performance and efficiency challenges, mainly due to unreliable features and exhaustive class loops. In this work, we propose a new paradigm AirShot, and discover that, by fully exploiting the valuable correlation map, AirShot can result in a more robust and faster few-shot object detection system, which is more applicable to robotics community. The core module Top Prediction Filter (TPF) can operate on multi-scale correlation maps in both the training and inference stages. During training, TPF supervises the generation of a more representative correlation map, while during inference, it reduces looping iterations by selecting top-ranked classes, thus cutting down on computational costs with better performance. Surprisingly, this dual functionality exhibits general effectiveness and efficiency on various off-the-shelf models. Exhaustive experiments on COCO2017, VOC2014, and SubT datasets demonstrate that TPF can significantly boost the efficacy and efficiency of most off-the-shelf models, achieving up to 36.4\% precision improvements along with 56.3\% faster inference speed. We also opensource the DARPA Subterranean (SubT) Dataset for Few-shot Object Detection.

TODO

DARPA Subterranean (SubT) Dataset for Few-shot Object Detection

Access the data and annotations through the following link: Dataset

Pre-trained Checkpoints

Access the pre-trained checkpoints and data through the following link: Pre-trained Checkpoints

Dataset Preparation(Credit: Bowen Li)

We provide official implementation here to reproduce the results w/o fine-tuning of ResNet101 backbone on:

1. Download official datasets

MS COCO 2017

PASCAL VOC

COCO format VOC annotations

Expected dataset Structure:

coco/
  annotations/
    instances_{train,val}2017.json
    person_keypoints_{train,val}2017.json
  {train,val}2017/
VOC20{12}/
  annotations/
    json files
  JPEGImages/

2. Generate supports

Download and unzip support (COCO json files) MEGA/BaiduNet(pwd:1134) in

datasets/
  coco/
    new_annotations/

Download and unzip support (VOC json files) MEGA/BaiduNet(pwd:1134) in

datasets/
  voc/
    new_annotations/

Run the script

cd datasets
bash generate_support_data.sh

You may modify 4_gen_support_pool_10_shot.py line 190, 213, and 269 with different shots (default is 1 shot).

Usage

Base training

Download base R-101 model in /output

start training

bash train.sh

It necessarily run 2 stages containing base training and further fine-tuning, different config files are loaded.

Inference w/o fine-tuning

bash test.sh

Warning

This code is a pre-release, changes are made for modularization purpose thus not verified yet. (Will fix once I am free) If you find any problem(s), email or issues are welcomed.

Citation

If AirShot motivates your work or used as baseline, please consider citing us as:

@inproceedings{wang2024airshot,
  title = {{AirShot}: Efficient Few-Shot Detection for Autonomous Exploration},
  author = {Wang, Zihan and Li, Bowen and Wang, Chen and Scherer, Sebastian},
  booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year = {2024},
  url = {https://arxiv.org/pdf/2404.05069.pdf}
}