Zhongdao / UniTrack

[NeurIPS'21] Unified tracking framework with a single appearance model. It supports Single Object Tracking (SOT), Video Object Segmentation (VOS), Multi-Object Tracking (MOT), Multi-Object Tracking and Segmentation (MOTS), Pose Tracking, Video Instance Segmentation (VIS), and class-agnostic MOT (e.g. TAO dataset).
MIT License
334 stars 35 forks source link

SOT on LaSOT #26

Closed Flowerfan closed 2 years ago

Flowerfan commented 2 years ago

Hi, Zhongdao,

Thank you for your great work!

I have tested your code on LaSOT using the crw_resnet18_s3 model by modifying the datasets root in utils.py. But the AUC is only 23.02 on this dataset. I'm not sure I got the correct results. Have you tested the unitrack on LaSOT or GOT10k? Could you provide the results on LaSOT and GOT10k at your convenience?

Zhongdao commented 2 years ago

Hi @Flowerfan, We indeed have results on more SOT datasets (we did these during rebuttal). You will find these results in the camera-ready version of our supplementary material, and below: image

Here UniTrack uses an ImageNet pre-trained ResNet-50 as the appearance model. Your results (23.02 on LaSOT) seem correct. If you want to obtain a better result I suggest slightly tuning hyperparameters here: https://github.com/Zhongdao/UniTrack/blob/44ae779994247fe69ebe297b7e384cc9e3e00195/test/test_sot_cfnet.py#L44

For instance, change this line https://github.com/Zhongdao/UniTrack/blob/44ae779994247fe69ebe297b7e384cc9e3e00195/test/test_sot_cfnet.py#L56 to num_scale = 5 helps.

Flowerfan commented 2 years ago

Thank you for sharing these results! Much appreciate!