Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline. Xiao Wang, Shiao Wang, Chuanming Tang, Lin Zhu, Bo Jiang, Yonghong Tian, Jin Tang (2023). arXiv preprint arXiv:2309.14611. [Paper] [Code] [DemoVideo]
Tracking using bio-inspired event cameras draws more and more attention in recent years. Existing works either utilize aligned RGB and event data for accurate tracking or directly learn an event-based tracker. The first category needs more cost for inference and the second one may be easily influenced by noisy events or sparse spatial resolution. In this paper, we propose a novel hierarchical knowledge distillation framework that can fully utilize multi-modal / multi-view information during training to facilitate knowledge transfer, enabling us to achieve high-speed and low-latency visual tracking during testing by using only event signals. Specifically, a teacher Transformer based multi-modal tracking framework is first trained by feeding the RGB frame and event stream simultaneously. Then, we design a new hierarchical knowledge distillation strategy which includes pairwise similarity, feature representation and response maps based knowledge distillation to guide the learning of the student Transformer network. Moreover, since existing event-based tracking datasets are all low-resolution ($346 \times 260$), we propose the first large-scale high-resolution ($1280 \times 720$) dataset named EventVOT. It contains 1141 videos and covers a wide range of categories such as pedestrians, vehicles, UAVs, ping pongs, etc. Extensive experiments on both low-resolution (FE240hz, VisEvent, COESOT), and our newly proposed high-resolution EventVOT dataset fully validated the effectiveness of our proposed method.
:fire: [2024.03.12] A New Long-term RGB-Event based Visual Object Tracking Benchmark Dataset (termed FELT) is available at [Paper] [Code] [DemoVideo]
:fire: [2024.02.28] Our code, visualizations and other experimental results have been updated.
:fire: [2024.02.27] Our work is accepted by CVPR-2024!
:fire: [2023.12.04] EventVOT_eval_toolkit, from EventVOT_eval_toolki (Passcode:wsad)
:fire: [2023.09.26] arXiv paper, dataset, pre-trained models, and benchmark results are all released [arXiv]
A demo video Youtube can be found by clicking the image below:
A distillation framework for Event Stream-based Visual Object Tracking.
[HDETrack_S_ep0050.pth] Passcode:wsad
[Raw Results] Passcode:wsad
Install env
conda create -n hdetrack python=3.8
conda activate hdetrack
bash install.sh
Run the following command to set paths for this project
python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir ./output
After running this command, you can also modify paths by editing these two files
lib/train/admin/local.py # paths about training
lib/test/evaluation/local.py # paths about testing
Then, put the tracking datasets EventVOT in ./data
.
Download pre-trained MAE ViT-Base weights and put it under $/pretrained_models
Download teacher pre-trained CEUTrack_ep0050.pth and put it under $/pretrained_models
Download the trained model weights from [HDETrack_S_ep0050.pth] and put it under $/output/checkpoints/train/hdetrack/hdetrack_eventvot
for test directly.
You can also access Weight files in Dropbox to download these weights files.
# train
python tracking/train.py --script hdetrack --config hdetrack_eventvot --save_dir ./output --mode single --nproc_per_node 1 --use_wandb 0
# test
python tracking/test.py hdetrack hdetrack_eventvot --dataset eventvot --threads 1 --num_gpus 1
Note: The speeds reported in our paper were tested on a single RTX 3090 GPU.
:floppy_disk: Baidu Netdisk: link:https://pan.baidu.com/s/1NLSnczJ8gnHqF-69bE7Ldg?pwd=wsad code:wsad
:floppy_disk: Baidu Netdisk: link:https://pan.baidu.com/s/1ZTX7O5gWlAdpKmd4R9VhYA?pwd=wsad code:wsad
:floppy_disk: Dropbox: https://www.dropbox.com/scl/fo/fv2e3i0ytrjt14ylz81dx/h?rlkey=6c2wk2z7phmbiwqpfhhe29i5p&dl=0
wget -O EventVOT_dataset.zip https://www.dropbox.com/scl/fo/fv2e3i0ytrjt14ylz81dx/h?rlkey=6c2wk2z7phmbiwqpfhhe29i5p"&"dl=1
The directory should have the below format:
├── EventVOT dataset
├── Training Subset (841 videos, 180.7GB)
├── recording_2022-10-10_17-28-38
├── img
├── recording_2022-10-10_17-28-38.csv
├── groundtruth.txt
├── absent.txt
├── ...
├── Testing Subset (282 videos, 64.88GB)
├── recording_2022-10-10_17-28-24
├── img
├── recording_2022-10-10_17-28-24.csv
├── groundtruth.txt
├── absent.txt
├── ...
├── validating Subset (18 videos, 4.34GB)
├── recording_2022-10-10_17-31-07
├── img
├── recording_2022-10-10_17-31-07.csv
├── groundtruth.txt
├── absent.txt
├── ...
$/eventvot_tracking_results/
and modify the name in $/utils/config_tracker.m
Evaluate_EventVOT_benchmark_SP_PR_only.m
for the overall performance evaluation, including SR, PR, NPR.plot_BOC.m
for BOC score evaluation and figure plot.plot_radar.m
for attributes radar figrue plot.Evaluate_EventVOT_benchmark_attributes.m
for attributes analysis and figure saved in $/res_fig/
.
The overall performance evaluation, including SR, PR, NPR.
@inproceedings{wang2024event,
title={Event stream-based visual object tracking: A high-resolution benchmark dataset and a novel baseline},
author={Wang, Xiao and Wang, Shiao and Tang, Chuanming and Zhu, Lin and Jiang, Bo and Tian, Yonghong and Tang, Jin},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={19248--19257},
year={2024}
}