UTrack: Multi-Object Tracking with Uncertain Detections
Edgardo Solano-Carrillo, Felix Sattler, Antje Alex, Alexander Klein, Bruno Pereira Costa, Angel Bueno Rodriguez, Jannis Stoppe
The tracking-by-detection paradigm is the mainstream in multi-object tracking, associating tracks to the predictions of an object detector. Although exhibiting uncertainty through a confidence score, these predictions do not capture the entire variability of the inference process. For safety and security critical applications like autonomous driving, surveillance, etc., knowing this predictive uncertainty is essential though. Therefore, we introduce, for the first time, a fast way to obtain the empirical predictive distribution during object detection and incorporate that knowledge in multi-object tracking. Our mechanism can easily be integrated into state-of-the-art trackers, enabling them to fully exploit the uncertainty in the detections. Additionally, novel association methods are introduced that leverage the proposed mechanism. We demonstrate the effectiveness of our contribution on a variety of benchmarks, such as MOT17, MOT20, DanceTrack, and KITTI.
First clone this repository:
git clone https://github.com/DLR-MI/UTrack
cd UTrack
and install suitable dependencies. This was tested on conda environment with Python 3.8.
mamba create -n track python=3.8
mamba activate track
mamba install pytorch torchvision
pip install cython
pip install -r requirements.txt
Download MOT17, MOT20, CrowdHuman, Cityperson, ETHZ and put them in a /data
folder. Prepare the datasets for training MOT17 and MOT20 for ablation and testing:
bash tools/convert_datasets_to_coco.sh
bash tools/mix_data_for_training.sh
If you use another path for the data, make sure to change the DATA_PATH
in the scripts invoked above. To clip the annotated bounding boxes to the image, run (e.g. for MOT17)
python tools/fix_yolo_annotations.py --folder /data/MOT17
For DanceTrack and KITTI, you can also find scripts in ./tools
to convert to the COCO format. No mixing is necessary here.
Note: make sure that the folder structure of KITTI, after converting to COCO, mimics the one for MOTXX.
As an example, training YOLOv8-l
for the ablation experiments of MOT17 is done by running
$ python train.py --model yolov8l --exp ablation_17 --gpu_id 0
Take a look at train.py
for the available mot_choices
for the --exp
argument.
The model weights used for the experiments in the paper can be downloaded below.
Dataset | Weights |
---|---|
Ablation 17 | [download] |
Ablation 20 | [download] |
Mix 17 | [download] |
Mix 20 | [download] |
DanceTrack | [download] |
KITTI | [download] |
After downloading, rename to best.pt
and place it within the corresponding folder in ./yolov8l-mix/EXP/weights
, where EXP
is the experiment name referred to during training (i.e. mix_17
, ablation_17
, etc.) Note that KITTI was only trained for the pedestrian class.
For tracking, you just need to modify ./tracker/config/track_EXP.yaml
if considering different tracker parameters, where EXP
is the experiment name.
$ python track.py --project yolov8l-mix --exp ablation_17 --data_root /data/MOT17 --association uk_botsort
This produces an entry in the folder ./track_results
with relevant metrics. The available trackers can be explored
by referring to the ./tracker/associations/collections.py
module.
You can do hyperparameter search of multiple trackers in parallel, or run multiple seeds for a single tracker. The former is done by executing ./hp_search.py
on a .yaml
template in ./tracker/eval/hp_search
. The latter, by using
the --association
argument.
Just by adding (on top of BoT-SORT) the right observation noise in the Kalman filter:
Method | HOTA | DetA | AssA | MOTA | IDF1 |
---|---|---|---|---|---|
BoT-SORT | 53.8 | 77.8 | 37.3 | 89.7 | 56.1 |
UTrack | 55.8 | 79.4 | 39.3 | 89.7 | 56.4 |
The evaluation runs at 27 FPS on a NVIDIA A100 GPU, compared with the 32 FPS of BoT-SORT in the same machine.
A demo can be run on a sample video downloaded from Youtube by executing
python ./tools/track_demo.py --exp dancetrack --association uk_botsort --video_path /path/to/cropped_video --output_path /path/to/output/video
If you find this work useful, it would be cool to know it by giving us a star 🌟. Also, consider citing it as
@misc{UTrack,
title={UTrack: Multi-Object Tracking with Uncertain Detections},
author={Edgardo Solano-Carrillo and Felix Sattler and Antje Alex and Alexander Klein and Bruno Pereira Costa and Angel Bueno Rodriguez and Jannis Stoppe},
year={2024},
eprint={2408.17098},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.17098},
}
A portion of the code is borrowed from ByteTrack, BoT-SORT, SparseTrack, UCMCTrack, and TrackEval. Many thanks for their contributions.
Also thanks to Ultralytics for making object detection more user friendly.