PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework

Bowen Li, Ziyuan Huang, Junjie Ye, Yiming Li, Sebastian Scherer, Hang Zhao, and Changhong Fu

Our paper is accepted at ICCV 2023 !!

Abstract

Visual object tracking is essential to intelligent robots. Most existing approaches have ignored the online latency that can cause severe performance degradation during real-world processing. Especially for unmanned aerial vehicles (UAVs), where robust tracking is more challenging and onboard computation is limited, the latency issue can be fatal. In this work, we present a simple framework for end-to-end latency-aware tracking, i.e., end-to-end predictive visual tracking (PVT++). Unlike existing solutions that naively append Kalman Filters after trackers, PVT++ can be jointly optimized, so that it takes not only motion information but can also leverage the rich visual knowledge in most pre-trained tracker models for robust prediction. Besides, to bridge the training-evaluation domain gap, we propose a relative motion factor, empowering PVT++ to generalize to the challenging and complex UAV tracking scenes. These careful designs have made the small-capacity lightweight PVT++ a widely effective solution. Additionally, this work presents an extended latency-aware evaluation benchmark for assessing an any-speed tracker in the online setting. Empirical results on a robotic platform from the aerial perspective show that PVT++ can achieve significant performance gain on various trackers and exhibit higher accuracy than prior solutions, largely mitigating the degradation brought by latency.

Overview

We provide baseline results and trained models available for download in the PVT++ Model Zoo.

TODO.

[x] Code for PVT++
- [x] Train
- [x] Test
[x] Code for e-LAE
[ ] All the official models
- [x] SiamRPN++_Mob
- [ ] SiamRPN++_Res
- [ ] SiamMask
[ ] All the raw results for PVT++
- [x] SiamRPN++_Mob
- [ ] SiamRPN++_Res
- [ ] SiamMask
[ ] All the vanilla tracker online results

Installation

Please create a python environment including:

Python 3.7.12

numpy 1.19.2

CUDA compiler CUDA 11.0

PyTorch 1.7.0

Pillow 8.3.1

torchvision 0.8.1

fvcore 0.1.5

cv2 4.5.4

colorama 0.4.4

tensorboardx 2.5.1

We are basically using PySOT environments.

Dataset Preparation

1. Download test datasets

DTB70, UAVDT, UAV123,UAV20L

Put them into testing_dataset directory as:

testing_dataset/
    DTB70/
        Animal1/
        ...
    UAVDT/
        anno/
        data_seq/
    UAV20L/
        anno/
        data_seq/
    ...

2. Download training datasets

VID, LaSOT, GOT-10k

Put them into training_dataset directory as:

training_dataset/
    got10k/
        data/
            GOT-10k_Train_000001/
            ...
        gen_json.py
        train.json
    lasot/
        data/
            airplane-1/
            ...
        gen_json.py
        gen_txt.py
        train.json
    vid/
        ILSVRC2015/
            Annotations/
            Data/
            ImageSets/
    gen_json.py
    parse_vid.py
    train.json

3. Generating train.json

cd training_dataset/got10k
python gen_json.py

cd training_dataset/lasot
python gen_txt.py
python gen_json.py

cd training_dataset/vid
python parse_vid.py
python gen_json.py

Note

You make check the dataset paths in /PVT++/pysot/core/config.py Line163-183

Test models

1. Add PVT++ to your PYTHONPATH

export PYTHONPATH=/path/to/PVT++:$PYTHONPATH

2. Download PVT++ models

Download models in PVT++ Model Zoo and put the them in my_models/.

3. Test models on Nvidian Jetson AGX Xavier (You may find this tutorial useful to set up env on AGX)

bash test_mob_agx.sh

4. Test models on PC with the simulated latency

4.1 Generate simulated latency

Download our Raw_results, put it in PVT++ folder

python tools/gen_sim_info.py

You may need to specify the datasets in the file

The simulation pkl files will be in testing_dataset/sim_info

4.2 Test with recorded latency

bash test_sim_mob.sh

You'll generate the raw results in results_rt_raw/

Evaluation

1. Convert .pkl files to .txt files

bash convert.sh # sigma = 0, predictive trackers, results in /results_rt_raw
# output results are in /results_rt
bash convert_new.sh # sigma = 0:0.02:1, original trackers, results in /Raw, we'll provide all the results soon
# output results are in /results_eLAE

2. Evaluation results

refer to e-LAE code

Training :wrench:

Download base tracking models in PVT++ Model Zoo and put the them in pretrained/.

bash train.sh

The trained models will be in /snapshot

LB5 refers to motion model, lbv5 denotes visual predictor, mv16 denotes joint model.

Reference

If our work inspires your research, please cite us as:

@INPROCEEDINGS{Li2023iccv,       
    author={Li, Bowen and Huang, Ziyuan and Ye, Junjie and Li, Yiming and Scherer, Sebastian and Zhao, Hang and Fu, Changhong},   
    booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, 
    title={{PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework}},
    year={2023},
    volume={},
    number={},
    pages={10006--10016},
}

Acknowledgement

Our work is motivated by ECCV2020 "Towards Streaming Perception" and "Predictive Visual Tracking", we express our gratitude to the authors. This library is developed upon PySOT, we sincerely thank the contributors and developers.

Jaraxxus-Me / PVT_pp

readme