ReST, a novel reconfigurable graph model, that first associates all detected objects across cameras spatially before reconfiguring it into a temporal graph for Temporal Association. This two-stage association approach enables us to extract robust spatial and temporal-aware features and address the problem with fragmented tracklets. Furthermore, our model is designed for online tracking, making it suitable for real-world applications. Experimental results show that the proposed graph model is able to extract more discriminating features for object tracking, and our model achieves state-of-the-art performance on several public datasets.
Clone the project and create virtual environment
git clone https://github.com/chengche6230/ReST.git
conda create --name ReST python=3.8
conda activate ReST
Install (follow instructions):
Reference commands:
# torchreid
git clone https://github.com/KaiyangZhou/deep-person-reid.git
cd deep-person-reid/
pip install -r requirements.txt
conda install pytorch torchvision cudatoolkit=9.0 -c pytorch
python setup.py develop
# other packages (in /ReST)
conda install -c dglteam/label/cu117 dgl
pip install git+https://github.com/ildoonet/pytorch-gradual-warmup-lr.git
pip install motmetrics
Install other requirements
pip install -r requirements.txt
Download pre-trained ReID model
./datasets/
as:
./datasets/
├── CAMPUS/
│ ├── Garden1/
│ │ └── view-{}.txt
│ ├── Garden2/
│ │ └── view-HC{}.txt
│ ├── Parkinglot/
│ │ └── view-GL{}.txt
│ └── metainfo.json
├── PETS09/
│ ├── S2L1/
│ │ └── View_00{}.txt
│ └── metainfo.json
├── Wildtrack/
│ ├── sequence1/
│ │ └── src/
│ │ ├── annotations_positions/
│ │ └── Image_subsets/
│ └── metainfo.json
└── {DATASET_NAME}/ # for customized dataset
├── {SEQUENCE_NAME}/
│ └── {ANNOTATION_FILE}.txt
└── metainfo.json
metainfo.json
files (e.g. frames, file pattern, homography)python ./src/datasets/preprocess.py --dataset {DATASET_NAME}
Check ./datasets/{DATASET_NAME}/{SEQUENCE_NAME}/output
if there is anything missing:
/output/
├── gt_MOT/ # for motmetrics
│ └── c{CAM}.txt
├── gt_train.json
├── gt_eval.json
├── gt_test.json
└── {DETECTOR}_test.json # if you want to use other detector, e.g. yolox_test.json
{FRAME}_{CAM}.jpg
in /output/frames
.Download trained weights if you need, and modify TEST.CKPT_FILE_SG & TEST.CKPT_FILE_TG in ./configs/{DATASET_NAME}.yml . |
Dataset | Spatial Graph | Temporal Graph |
---|---|---|---|
Wildtrack | sequence1 | sequence1 | |
CAMPUS | Garden1 Garden2 Parkinglot |
Garden1 Garden2 Parkinglot |
|
PETS-09 | S2L1 | S2L1 |
To train our model, basically run the command:
python main.py --config_file ./configs/{DATASET_NAME}.yml
In {DATASET_NAME}.yml
:
MODEL.MODE
to 'train'SOLVER.TYPE
to train specific graphs.DEVICE_ID
, BATCH_SIZE
.python main.py --config_file ./configs/Wildtrack.yml MODEL.DEVICE_ID "('1')" SOLVER.TYPE "SG"
python main.py --config_file ./configs/{DATASET_NAME}.yml
In {DATASET_NAME}.yml
:
MODEL.MODE
to 'test'.MODEL.DETECTION
.
{DETECTOR}_test.json
in ./datasets/{DATASET_NAME}/{SEQUENCE_NAME}/output/
by your own first.TEST
are configured.If you find this code useful for your research, please cite our paper
@InProceedings{Cheng_2023_ICCV,
author = {Cheng, Cheng-Che and Qiu, Min-Xuan and Chiang, Chen-Kuo and Lai, Shang-Hong},
title = {ReST: A Reconfigurable Spatial-Temporal Graph Model for Multi-Camera Multi-Object Tracking},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {10051-10060}
}