AIR is a deep learning based object detection solution to automate the aerial drone footage inspection task frequently carried out during search and rescue (SAR) operations with drone units. It provides a fast, convenient and reliable way to augment aerial, high-resolution image inspection for clues about human presence by highlighting relevant image regions with bounding boxes, as done in the image below. With the assistance of AIR, SAR missions with aerial drone searches can likely be carried out much faster before, and with considerably higher success rate.
This code repository is based on the master's thesis work by Pasi Pyrrö from Aalto University, School of Science which was funded by Accenture.
In the table below are listed the results of the AIR detector and other notable state-of-the-art methods on the HERIDAL test set. Note that the FPS metric is a rough estimate based on the reported inference speed and hardware setup by the original authors, and should be taken as mostly directional. We did not test any of the other methods ourselves.
Method | Precision | Recall | AP | FPS |
---|---|---|---|---|
Mean shift segmentation method [1] | 18.7 | 74.7 | - | - |
Saliency guided VGG16 [2] | 34.8 | 88.9 | - | - |
Faster R-CNN [3] | 67.3 | 88.3 | 86.1 | 1 |
Two-stage multimodel CNN [4] | 68.9 | 94.7 | - | 0.1 |
SSD [4] | 4.3 | 94.4 | - | - |
AIR with NMS (ours) | 90.1 | 86.1 | 84.6 | 1 |
It turns out AIR achieves state-of-the-art results in precison and inference speed while having comparable recall to the strongest competitors!
You can check out the full details of AIR evaluation in this Wandb report.
AIR implements both NMS and MOB algorithms for bounding box prediction postprocessing. The image above shows the main difference: MOB (c) merges bounding boxes (a) instead of eliminating them like NMS (b). Thus, choosing MOB can produce visually more pleasing predictions. Moreover, MOB is less sensitive to the choice of condifence score threshold, making it more robust under unseen data. AIR also comes with a custom SAR-APD evaluation scheme that truthfully ranks MOB-equipped object detector performance (as standard object detection metrics, such as VOC2012 AP, do not like MOB very much).
Should you choose MOB postprocessing and SAR-APD evaluation, you should see similar evaluation results (e.g., after running bash evaluate.sh
)
All metrics increased by over 4 percentage points from NMS & VOC2012 results with no loss of prediction visual quality, neat!
If using containers:
If using native installation:
pip
and setuptools
bash start-air-cpu-env.sh docker
bash start-air-gpu-env.sh docker
python setup.py build_ext --inplace
SINGULARITY_CACHEDIR
env variable prior to installation, if so edit the shell scripts used in the next stepsbash start-air-cpu-env.sh singularity
bash start-air-gpu-env.sh singularity
cd AIR
python setup.py build_ext --inplace
/usr/bin/python3 -m pip install air-detector[cpu]
/usr/bin/python3 -m pip install air-detector[gpu]
data/images
data/videos
models/
folder
/bin/bash convert-model.sh
data/datasets
Once everything is set up (installation and asset downloads), you might wanna try out these cool and simple demos to get the hang of using the AIR detector.
/bin/bash infer.sh
data/predictions/dauntless-sweep-2_resnet152_pascal-enclose-inference/
folder for the output images/bin/bash evaluate.sh
data/predictions/dauntless-sweep-2_resnet152_pascal-enclose-sar-apd-eval/
folder for the output images/bin/bash convert-model.sh
/usr/bin/python3 video_detect.py -c mob_cpu_images
data/videos/Ylojarvi-gridiajo-two-guys-moving_air_output/
folder for the output images/bin/bash convert-model.sh
/usr/bin/python3 video_detect.py -c mob_gpu_tracking
data/videos/Ylojarvi-gridiajo-two-guys-moving_air_output_compressed.mov
output videoWANDB_MODE="disabled"
WANDB_MODE="dryrun"
WANDB_API_KEY=<your_api_key>
~/.bashrc
or ~/.zshrc
so that it's automatically included into the Docker envs. On most linux based systems, you can achieve this by running this shell command with your Wandb API key:
echo "export WANDB_API_KEY=<your_api_key>" >> "~/.${SHELL/\/bin\//}rc"; exec $SHELL
WANDB_DIR=~/wandb
WANDB_PROJECT="air-testing"
WANDB_ENTITY="ML-Mike"
WANDB_RUN_GROUP="cool-experiments"
keras-retinanet
framework to incorporate aerial person detection (APD) support to it.video_detect.py
script (easier to use than the CLI in most cases)/datasets
(input for training and evaluation) /images
(input for general inference)/predictions
(output of the AIR detector)/videos
(input for video_detect.py
)AIR_VERBOSE=1
enviroment variable to see full TF logsvideo_detect.py
might need to be recalibrated for each use case for best performanceevaluate.sh
), make sure there are no whitespace after linebreaks '\
', bash can be picky about these things... Also avoid commenting out any command line parameters in those scripts, just delete the whole line outrightkeras_retinanet/keras_retinanet/bin
folder and they can be called directly (or you can try the installed console scripts (e.g., air-evaluate
) if you ran pip install .
) with approproate parameters (examples can be found in those bash scripts)-h
or --help
usually helps (pun intended)@MastersThesis{pyrro2021air,
title={{AIR:} {Aerial} Inspection RetinaNet for Land Search and Rescue Missions},
author={Pyrr{\"o}, Pasi and Naseri, Hassan and Jung, Alexander},
school={Aalto University, School of Science},
year={2021}
}
[1] TURIC, H., DUJMIC, H., AND PAPIC, V. Two-stage segmentation of aerial images for search and rescue. Information Technology and Control 39, 2 (2010).
[2] BOŽIC-ŠTULIC, D., MARUŠIC, Ž., AND GOTOVAC, S. Deep learning approach in aerial imagery for supporting land search and rescue missions. International Journal of Computer Vision 127, 9 (2019), 1256–1278.
[3] MARUŠIC, Ž., BOŽIC-ŠTULIC, D., GOTOVAC, S., AND MARUŠIC, T. Region proposal approach for human detection on aerial imagery. In 2018 3rd International Conference on Smart and Sustainable Technologies (SpliTech) (2018), IEEE, pp. 1–6.
[4] VASIC, M. K., AND PAPIC, V. Multimodel deep learning for person detection in aerial images. Electronics 9, 9 (2020), 1459.