how to use the DCF head for SOT?

Zhongdao / UniTrack

[NeurIPS'21] Unified tracking framework with a single appearance model. It supports Single Object Tracking (SOT), Video Object Segmentation (VOS), Multi-Object Tracking (MOT), Multi-Object Tracking and Segmentation (MOTS), Pose Tracking, Video Instance Segmentation (VIS), and class-agnostic MOT (e.g. TAO dataset).

MIT License

338 stars 34 forks source link

how to use the DCF head for SOT? #14

Closed jimmy-dq closed 3 years ago

jimmy-dq commented 3 years ago

Hi, Thanks for your interesting work. Does this code contain the DCF head tracking part? Since I don't find this tracker in the tracker folder.

Zhongdao commented 3 years ago

Hi, Please see this script. A reminder: the current DCF related code needs a lower version of pytorch, e.g. 1.3.0, because in higher versions like 1.9.0 the APIs of complex number computation have been changed.

jimmy-dq commented 3 years ago

Hi, Please see this script. A reminder: the current DCF related code needs a lower version of pytorch, e.g. 1.3.0, because in higher versions like 1.9.0 the APIs of complex number computation have been changed.

Thanks for your reply. Yes, I find it. I can get the following result:

imagenet_resnet18_s3: AUC: 0.624 Precision: 0.829

This is slightly better than the reported results in Table-2 (a). The environment I test is: Cuda: 10.0; RTX TiTan; PyTorch: 1.2.0;

For the SiamFC tracker, why don't you use the modified ResNet model architectures provided by Zhipeng Zhang (SiamDW)? This may get the better performance. If I understand correctly, you use the original resnet models (only modify the strides in layer3 and layer4 to 1) for SiamFC trackers.

Thanks for your time.

Zhongdao commented 3 years ago

Good to see that your result is even slightly better than that in Table-2 (a). There probably exist some subtle differences between the current code/config and code used for Table-2, but I think it's okay that the results do not match perfectly, as long as the general trends do not change.
Re SiamDW: Good point. Here we aim to use an identical representation for different tasks so we do not hope the appearance model is trained for one specific task. Therefore we adopted ImageNet pre-trained weights for each architecture. But we did not find ImageNet pre-trained weights SiamDW. Of course, we can adopt SiamDW trained for SOT, and hopefully, the performance on SOT could probably be better, but for other tasks, it's hard to say if it would be better.

jimmy-dq commented 3 years ago

Good to see that your result is even slightly better than that in Table-2 (a). There probably exist some subtle differences between the current code/config and code used for Table-2, but I think it's okay that the results do not match perfectly, as long as the general trends do not change.

Re SiamDW: Good point. Here we aim to use an identical representation for different tasks so we do not hope the appearance model is trained for one specific task. Therefore we adopted ImageNet pre-trained weights for each architecture. But we did not find ImageNet pre-trained weights SiamDW. Of course, we can adopt SiamDW trained for SOT, and hopefully, the performance on SOT could probably be better, but for other tasks, it's hard to say if it would be better.

Thanks for your kind explanation. No other probs, I will close this issue.