chen-si-jia / Trajectory-Long-tail-Distribution-for-MOT

⭕️ Official codes for "Delving into the Trajectory Long-tail Distribution for Muti-object Tracking" (CVPR2024)
MIT License
41 stars 6 forks source link

Delving into the Trajectory Long-tail Distribution for Muti-object Tracking

【CVPR 2024】Delving into the Trajectory Long-tail Distribution for Muti-object Tracking
Sijia Chen, En Yu, Jinyang Li, Wenbing Tao
Paper (
YouTube (

If you have any problems with our work, please issue me. We will promptly reply it.



Multiple Object Tracking (MOT) is a critical area within computer vision, with a broad spectrum of practical implementations. Current research has primarily focused on the development of tracking algorithms and enhancement of post-processing techniques. Yet, there has been a lack of thorough examination concerning the nature of tracking data it self. In this study, we pioneer an exploration into the distribution patterns of tracking data and identify a pronounced long-tail distribution issue within existing MOT datasets. We note a significant imbalance in the distribution of trajectory lengths across different pedestrians, a phenomenon we refer to as “pedestrians trajectory long-tail distribution”. Addressing this challenge, we introduce a bespoke strategy designed to mitigate the effects of this skewed distribution. Specifically, we propose two data augmentation strategies, including Stationary Camera View Data Augmentation (SVA) and Dynamic Camera View Data Augmentation (DVA) , designed for viewpoint states and the Group Softmax (GS) module for Re-ID. SVA is to backtrack and predict the pedestrian trajectory of tail classes, and DVA is to use diffusion model to change the background of the scene. GS divides the pedestrians into unrelated groups and performs softmax operation on each group individually. Our proposed strategies can be integrated into numerous existing tracking systems, and extensive experimentation validates the efficacy of our method in reducing the influence of long-tail distribution on multi-object tracking performance.



Data preparation

Note: Each time you run, you need to delete the labels_with_ids folder.

Pretrained models and baseline model

DLA-34 COCO pretrained model: DLA-34 official. HRNetV2 ImageNet pretrained model: HRNetV2-W18 official, HRNetV2-W32 official. After downloading, you should put the pretrained models in the following structure:


Our baseline FairMOT model (DLA-34 backbone) is pretrained on the CrowdHuman for 60 epochs with the self-supervised learning approach and then trained on the MIX dataset for 30 epochs. The models can be downloaded here: crowdhuman_dla34.pth [Google] [Baidu, code:ggzx ] [Onedrive]. fairmot_dla34.pth [Google] [Baidu, code:uouv] [Onedrive]. After downloading, you should put the baseline model in the following structure:


The important notes:

Our processed MOT17 dataset by SVA and DVA can be downloaded here [Baidu, code:hust].

Our models can be downloaded here [Baidu, code:hust].



bash experiments/


bash experiments/


bash experiments/


bash experiments/


bash experiments/


bash experiments/

The data annotation of MOT20 is a little different from MOT17, the coordinates of the bounding boxes are all inside the image, so we need to uncomment line 313 to 316 in the dataset file src/lib/datasets/dataset/

#np.clip(xy[:, 0], 0, width, out=xy[:, 0])
#np.clip(xy[:, 2], 0, width, out=xy[:, 2])
#np.clip(xy[:, 1], 0, height, out=xy[:, 1])
#np.clip(xy[:, 3], 0, height, out=xy[:, 3])

Then, we can train on MOT20:


bash experiments/


bash experiments/


bash experiments/
bash experiments/



bash experiments/


bash experiments/


bash experiments/


bash experiments/

we evaluate on the other half of the training set of MOT17, you can run:

All classes(default):

bash experiments/

If you want to evaluate head classes and tail classes, you need to run tackle_module/head_tail_classes_division/ Then you need to place the generated gt_headclasses.txt and gt_tailclasses.txt file in the corresponding gt location of the MOT17 training dataset, like below:

                              |            |
                              |            |——————gt
                              |                   └——————gt.txt
                              |                   └——————gt_headclasses.txt
                              |                   └——————gt_tailclasses.txt
                              |            |
                              |            |——————gt
                              |                   └——————gt.txt
                              |                   └——————gt_headclasses.txt
                              |                   └——————gt_tailclasses.txt
                              |            |
                              |            |——————gt
                              |                   └——————gt.txt
                              |                   └——————gt_headclasses.txt
                              |                   └——————gt_tailclasses.txt
                              |            |
                              |            |——————gt
                              |                   └——————gt.txt
                              |                   └——————gt_headclasses.txt
                              |                   └——————gt_tailclasses.txt
                              |            |
                              |            |——————gt
                              |                   └——————gt.txt
                              |                   └——————gt_headclasses.txt
                              |                   └——————gt_tailclasses.txt
                              |            |
                              |            |——————gt
                              |                   └——————gt.txt
                              |                   └——————gt_headclasses.txt
                              |                   └——————gt_tailclasses.txt

Then you can run:

Head classes or tail classes:

bash experiments/


You can input a raw video and get the demo video by running src/ and get the mp4 format of the demo video:

cd src
python mot --load_model ../models/fairmot_dla34.pth --conf_thres 0.4

You can change --input-video and --output-root to get the demos of your own videos. --conf_thres can be set from 0.3 to 0.7 depending on your own videos.


The part of the code are borrowed from the follow work:

Thanks for their wonderful works.


    author    = {Chen, Sijia and Yu, En and Li, Jinyang and Tao, Wenbing},
    title     = {Delving into the Trajectory Long-tail Distribution for Muti-object Tracking},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {19341-19351}

To be continued

The SVA code will be updated soon.

Thank you! Please star it!