OpenSpaceAI / UVLTrack

The official pytorch implementation of our AAAI 2024 paper "Unifying Visual and Vision-Language Tracking via Contrastive Learning"
MIT License
27 stars 2 forks source link

Performances in two tables in README are different? #6

Closed laisimiao closed 7 months ago

laisimiao commented 7 months ago

Table1: 911f258a-c9e6-4a9c-82f5-0486a3ffb9a2 Table2: 2773765d-70a2-44dd-bd48-39c0cd5eff97

And in table2, NL+BBOX performance is lower than BBOX, what's the discovery?

568xiaoma commented 7 months ago

Table 1 shows the performance reported in the paper. Table 2 shows the performance of the released checkpoint.

We use mixed modal data (NL, BBOX, NL+BBOX) to train the model. Empirically, the more NL+BBOX samples, the better NL+BBOX tracking performance. Thus, NL+BBOX reference does not absolutely perform better than BBOX reference for our tracker. In our paper, we adopt a relatively balanced sample proportion to achieve good performance across different modal references.

laisimiao commented 7 months ago

So no checkpoint corresponding to performance in the paper release? I think the community prefers the consistent checkpoint to plot figures in their future research.

568xiaoma commented 7 months ago

Thank you for your attention and advice. The raw results are consistent with the paper, which provides the reference of our tracker for their future research. But, the original checkpoints were broken during migration, so we retrain our model and release the last checkpoint.

laisimiao commented 7 months ago

No wonder. Thank you and your nice work.