VISION-SJTU / USOT

[ICCV2021] Learning to Track Objects from Unlabeled Videos
63 stars 7 forks source link

When the pre-training model "imagenet_pretrain.model" is not loaded, the training results are very poor #17

Closed Scott233 closed 1 year ago

Scott233 commented 1 year ago

Is it necessary to load pre-trained backbone? And is there any way to make the network independent of the parameters of the pre-training model? Thank you very much!

zhengjilai commented 1 year ago

That is a very good question.

When I was studying unsupervised tracking 2 years ago, I actually built three variants of USOT. The first loaded imagenet supervised pretraining backbone (USOT* in paper), the second loaded unsupervised backbone Moco v2 (USOT in paper), and the last randomly initialized the backbone (just as UDT, CVPR2019). I noticed that the former two gave similar results, but the results of the third variant was very poor.

From USOT and ULAST (CVPR2022), it seems that pretrained backbone is currently still important for unsupervised tracking. It may be an interesting problem whether the pretrained backbone weights can be dropped.

Scott233 commented 1 year ago

That is a very good question.

When I was studying unsupervised tracking 2 years ago, I actually built three variants of USOT. The first loaded imagenet supervised pretraining backbone (USOT* in paper), the second loaded unsupervised backbone Moco v2 (USOT in paper), and the last randomly initialized the backbone (just as UDT, CVPR2019). I noticed that the former two gave similar results, but the results of the third variant was very poor.

From USOT and ULAST (CVPR2022), it seems that pretrained backbone is currently still important for unsupervised tracking. It may be an interesting problem whether the pretrained backbone weights can be dropped.

Thank you very much for your answers and directions!