Figure 1: Overview of the pipeline for KABR dataset preparation.
KABR tools requires that torch be installed.
The KABR tools used in this process can be installed with:
pip install torch torchvision
pip install git+https://github.com/Imageomics/kabr-tools
Notes:
gcc --version
and g++ --version
).pip install git+https://github.com/Imageomics/SlowFast@797a6f3ae81c49019d006296f1e0f84f431dc356
, which is included when installing kabr_tools
.Each KABR tool can be run through the command line (as described below) or imported as a python module. They each have help information which can be accessed on the command line through <tool-name> -h
.
Please refer to our KABR Project Page for additional details.
Figure 2: Clip of drone video containing Plains and Grevy's zebras, plus some impalas.
The drone videos for the KABR dataset were collected at the Mpala Research Centre in January 2023. The missions were flown manually, using a DJI 2S Air drone.
We collaborated with expert ecologists to ensure the disturbance to the animals was minimal. We launched the drone approximately 200 meters horizontal distance from the animals and an altitude of 30 meters. We gradually approached the herd from the side by reducing the altitude and horizontal distance, monitoring the animals for signs of vigilance.
Note, the vigilance exhibited by wildlife varies widely by species, habitat, sex, and the level to which animals may be habituated to anthropogenic noise. Therefore, we recommend tailoring your approach to your particular species and setting.
Please refer to our papers for details on the data collection process:
In order to automatically label the animal videos with behavior, we must first create mini-scenes of each individual animal captured in the frame, illustrated below.
Figure 3: A mini-scene is a sub-image cropped from the drone video footage centered on and surrounding a single animal. Mini-scenes simulate the camera as well-aligned with each individual animal in the frame, compensating for the movement of the drone and ignoring everything in the large field of view but the animal’s immediate surroundings. The KABR dataset consists of mini-scenes and their frame-by-frame behavior annotation.
See data/mini_scenes in HuggingFace for example mini-scenes.
Figure 4: Simplified CVAT annotation tool interface
Upload your raw videos to CVAT and perform the detections by drawing bounding boxes manually. This can be quite consuming, but has the advantage of generating highly accurate tracks.
Depending on the resolution of your raw video, you may encounter out of space issues with CVAT. You can use helper_scripts/downgrade.sh to reduce the size of your videos.
You may use YOLO to automatically perform detection on your videos. Use the script below to convert YOLO detections to CVAT format.
detector2cvat: Detect objects with Ultralytics YOLO detections, apply SORT tracking and convert tracks to CVAT format.
detector2cvat --video path_to_videos --save path_to_save [--imshow]
Once you have your tracks generated, use them to create mini-scenes from your raw footage.
tracks_extractor: Extract mini-scenes from CVAT tracks.
tracks_extractor --video path_to_videos --annotation path_to_annotations [--tracking] [--imshow]
You can use the KABR model to label the mini-scenes with behavior. See the ethogram folder for the list of behaviors used to label the zebra videos.
To use the KABR model, download checkpoint_epoch_00075.pyth.zip
, unzip checkpoint_epoch_00075.pyth
, and install SlowFast. Then run miniscene2behavior.py.
Label the mini-scenes:
miniscene2behavior [--config path_to_config] --checkpoint path_to_checkpoint [--gpu_num number_of_gpus] --miniscene path_to_miniscene [--output path_to_output_csv]
Notes:
config
. checkpoint
should be the path to checkpoint_epoch_00075.pyth
. gpu_num
is 0, the model will use CPU. Using at least 1 GPU greatly increases inference speed. If you're using OSC, you can request a node with one GPU by running sbatch -N 1 --gpus-per-node 1 -A [account] --time=[minutes] [bash script]
.See these csv files in HuggingFace for examples of annotated mini-scene outputs.
See time budgets example to code to create these visualizations.
Figure 5: Example flight path and video clip from KABR datasetL, 2 male Grevy's zebras observed for 10 minutes on 01/18/23.
Figure 6: Overall time budget for duration of 10 minute observation
Figure 7: Gantt chart for each zebra (3 minute duration)
If you wish to use YOLO to automatically generate detections, you may want to fine-tune your YOLO model for your dataset using the train_yolo notebook.
cvat2ultralytics: Convert CVAT annotations to Ultralytics YOLO dataset.
cvat2ultralytics --video path_to_videos --annotation path_to_annotations --dataset dataset_name [--skip skip_frames]
player: Player for tracking and behavior observation.
player --folder path_to_folder [--save] [--imshow]
Figure 7: Example player.py output.
cvat2slowfast: Convert CVAT annotations to the dataset in Charades format.
cvat2slowfast --miniscene path_to_mini_scenes --dataset dataset_name --classes path_to_classes_json [--old2new path_to_old2new_json]