Closed JJangD closed 1 month ago
The command tokens are just 4 learnable tokens. The past trajectories is not used to make command token, which just concat with the 4 command tokens. Maybe our figure is not very clear about that, sorry for make you misunderstand that. And answer is yes, we concat the past trajectories and command tokens with self-attention.
Thanks for a quick reply!
"And answer is yes, we concat the past trajectories and command tokens with self-attention." when doing this
another transformer encoder is utilized and than you take only command token variable part from the output of the encoder and feed to the main transformer encoder in the Figure 2 ?
or
The concat(trajectories, command token) is directly fed to the transformer encoder(the one in the figure 2 of ARTRACKv2 paper) along with search, template, appearance token, confidence token ?
and also I couldn't figure out which part of the code I should take a look to check trajectory token part. Can you show me where to look?
We concat trajectory, command tokens, search, template and appearance, and confidence tokens. Then we fed them into the encoder.
You can check that in lib/models/ostrack/base_backbone.py, you can find that in function forward_features
Thank you!
Hi.
Thanks for your great work !
I was reading paper but i couldn't understand about generating trajectory command token, the green one.
I understand about not using intra-frame autoregression, not generating xmin, ymin, xmax, ymax one by one.
Could you explain about how does the past trajectories are utilized to make command token?
Maybe concating past trajectories and learnable command tokens and use self-attention?