OpenSpaceAI / UVLTrack

The official pytorch implementation of our AAAI 2024 paper "Unifying Visual and Vision-Language Tracking via Contrastive Learning"
MIT License
27 stars 2 forks source link

About Training Dataset #10

Closed lixueqidaytoy closed 4 months ago

lixueqidaytoy commented 4 months ago

The performance of your work is very impressive! In your paper, you said UVLTrack was trained on GOT-10k, COCO2017, TrackingNet and other datasets. But other Vision-Language Tracker like JointNLT was just trained on LaSOT, OTB, TNL2K and RefCOCOg-google. I want to know did you trained on just these datasets like JointNLT and compared the performance with JointNLT? In your paper, you say JointNLT doesn't perform very well on LaSOT without natural language. Did you retrain JointNLT on datasets like TrackingNet? TrackingNet is very large.

568xiaoma commented 4 months ago

We did not train UVLTrack on modality-specific datasets. UVLTrack aims to unify visual and vision-language tracking into a single model. As a result, UVLTrack can utilize different modal references to train the model, which is an advantage of our method. However, JointNLT is specifically designed for vision-language tracking, which can not (or needs to modify some modules) be trained with datasets without language description, such as TrackingNet.