MIV-XJTU / ARTrack

Apache License 2.0
228 stars 33 forks source link

Why trackers need to be trained on COCO? #50

Closed MrNeoBlue closed 6 months ago

MrNeoBlue commented 6 months ago

One stupid question: Is this aimed for backbone feature extraction capability? It seems that it is an essential stage for training tracker models. Could you explain me the reason behind the pipeline?

AlexDotHam commented 6 months ago

The truth is, for tracking, coco in not essential. But the previous methods often employ the coco to vivify the diversity of the datasets, in my opinion in the large language model's time, it is useless. In this paper, for an fair compare, we use the coco to train our model, but in future, i think it is better to reduce this kind of datasets, or replace it with VastTrack or others solid and large and diverse datasets.

MrNeoBlue commented 6 months ago

Thx for ur reply! When training transformer based tracker, would the pretrained transformer backbone be capable of extracting diversified visual features directly? Have u conducted such experiments comparing the result of trackers w/ and w/o using the coc dataset? I guess dataset like got10k would play the same role, despite it's a small set. I just saw VastTrack paper very fresh on arxiv and will take a look.

AlexDotHam commented 6 months ago

In our setting, w/ or w/o coco is the same, in my opinion, a larger or better backbone is the key to improve the performance. Moreover, the MAE pretrained on other datasets such as Kinetic-700 or larger datasets will extremly impact the downstream tasks such as SOT et.al., i think it is a good vision to revisit the SOT field but not make the architecture innovation. In our experiments, a transformer encoder (or decoder only) architecture is good enough for tracking, so a fantastic architecture is naive or meaningless for the community.