WesLee88524 / LG-MOT

Multi-Granularity Language-Guided Multi-Object Tracking
Apache License 2.0
16 stars 1 forks source link

LG-MOT

Multi-Granularity Language-Guided Multi-Object Tracking

Oryx Video-ChatGPT

[Yuhao Li](), Muzammal Naseer, Jiale Cao, [Yu Zhu](), Jinqiu Sun, Yanning Zhang and Fahad Khan

Northwestern Polytechnical University, Mohamed bin Zayed University of AI, TianJin University, Linköping University

paper

Latest


Abstract Most existing multi-object tracking methods typically learn visual tracking features via maximizing dis-similarities of different instances and minimizing similarities of the same instance. While such a feature learning scheme achieves promising performance, learning discriminative features solely based on visual information is challenging especially in case of environmental interference such as occlusion, blur and domain variance. In this work, we argue that multi-modal language-driven features provide complementary information to classical visual features, thereby aiding in improving the robustness to such environmental interference. To this end, we propose a new multi-object tracking framework, named LG-MOT, that explicitly leverages language information at different levels of granularity (scene-and instance-level) and combines it with standard visual features to obtain discriminative representations. To develop LG-MOT, we annotate existing MOT datasets with scene-and instance-level language descriptions. We then encode both instance-and scene-level language information into high-dimensional embeddings, which are utilized to guide the visual features during training. At inference, our LG-MOT uses the standard visual features without relying on annotated language descriptions. Extensive experiments on three benchmarks, MOT17, DanceTrack and SportsMOT, reveal the merits of the proposed contributions leading to state-of-the-art performance. On the DanceTrack test set, our LG-MOT achieves an absolute gain of 2.2% in terms of target object association (IDF1 score), compared to the baseline using only visual features. Further, our LG-MOT exhibits strong cross-domain generalizability.

Intro

Dataset Videos (Scenes) Annotated Scenes Tracks (Instances) Annotated Instances Annotated Boxes Frames
MOT17-L 7 7 796 796 614,103 110,407
DanceTrack-L 65 65 682 682 576,078 67,304
SportsMOT-L 90 90 1,280 1,280 608,152 55,544
Total 162 162 2,758 2,758 1798,333 233,255

Visualization

Performance on Benchmarks

MOT17 Challenge - Test Set

Dataset IDF1 HOTA MOTA ID Sw. model
MOT17 81.7 65.6 81.0 1161 checkpoint.pth

DanceTrack - Test Set

Dataset IDF1 HOTA MOTA AssA DetA model
DanceTrack 60.5 61.8 89.0 80.0 47.8 checkpoint.pth

SportsMOT - Test Set

Dataset IDF1 HOTA MOTA ID Sw. model
SportsMOT 77.1 75.0 91.0 2847 checkpoint.pth

Setup

  1. Clone and enter this repository

    git clone https://github.com/WesLee88524/LG-MOT.git
    cd LG-MOT
  2. Create an Anaconda environment for this project:

    conda env create -f environment.yml
    conda activate LG-MOT
  3. Clone fast-reid (latest version should be compatible but we use this version) and install its dependencies. The fast-reid repo should be inside LG-MOT root:

    LG-MOT
    ├── src
    ├── fast-reid
    └── ...
  4. Download re-identification model we use from fast-reid and move it inside LG-MOT/fastreid-models/model_weights/

  5. Download MOT17, SPORTSMOT and DanceTrack datasets. In addition, prepare seqmaps to run evaluation (for details see TrackEval). We provide an example seqmap. Overall, the expected folder structure is:

    DATA_PATH
    ├── DANCETRACK
    │   └── ...
    ├── SPORTSMOT
    │   └── ...
    └── MOT17
        └── seqmaps
        │    ├── seqmap_file_name_matching_split_name.txt
        │    └── ...
        └── train
        │    ├── MOT17-02
        │    │   ├── det
        │    │   │   └── det.txt
        │    │   └── gt
        │    │   │   └── gt.txt
        │    │   └── img1 
        │    │   │   └── ...
        │    │   └── seqinfo.ini
        │    └── ...
        └── test
             └── ...
    

Training

You can launch a training from the command line. An example command for MOT17 training:

RUN=example_mot17_training
REID_ARCH='fastreid_msmt_BOT_R50_ibn'

DATA_PATH=YOUR_DATA_PATH

python scripts/main.py --experiment_mode train --cuda --train_splits MOT17-train-all --val_splits MOT17-train-all --run_id ${RUN}_${REID_ARCH} --interpolate_motion --linear_center_only --det_file byte065 --data_path ${DATA_PATH} --reid_embeddings_dir reid_${REID_ARCH} --node_embeddings_dir node_${REID_ARCH}  --zero_nodes --reid_arch $REID_ARCH --edge_level_embed --save_cp --inst_KL_loss 1 --KL_loss 1 --num_epoch 200 --mpn_use_prompt_edge 1 1 1 1 --use_instance_prompt 

Testing

You can test a trained model from the command line. An example for testing on MOT17 training set:

RUN=example_mot17_test
REID_ARCH='fastreid_msmt_BOT_R50_ibn'

DATA_PATH=your_data_path
PRETRAINED_MODEL_PATH=your_pretrained_model_path

python scripts/main.py --experiment_mode test --cuda --test_splits MOT17-test-all --run_id ${RUN}_${REID_ARCH} --interpolate_motion --linear_center_only --det_file byte065 --data_path ${DATA_PATH} --reid_embeddings_dir reid_${REID_ARCH} --node_embeddings_dir node_${REID_ARCH}  --zero_nodes --reid_arch $REID_ARCH --edge_level_embed --save_cp --hicl_model_path ${PRETRAINED_MODEL_PATH}  --inst_KL_loss 1 --KL_loss 1 --mpn_use_prompt_edge 1 1 1 1 

Citation

if you use our work, please consider citing us:

@misc{li2024multigranularity,
      title={Multi-Granularity Language-Guided Multi-Object Tracking}, 
      author={Yuhao Li and Muzammal Naseer and Jiale Cao and Yu Zhu and Jinqiu Sun and Yanning Zhang and Fahad Shahbaz Khan},
      year={2024},
      eprint={2406.04844},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

This project is released under the Apache license. See LICENSE for additional details.

Acknowledgement

The code is mainly based on SUSHI. Thanks for their wonderful works.