Closed BelieveF closed 9 months ago
In last_output_embed = tracks[b].last_output
, the b
is the corresponding batch, not the time step. In our experiments, as the batch size is set to 1, the b
is 0 all the time.
We update the last_output
in the later code.
In
last_output_embed = tracks[b].last_output
, theb
is the corresponding batch, not the time step. In our experiments, as the batch size is set to 1, theb
is 0 all the time. We update thelast_output
in the later code.
Tanks for your reply!now i have saved my question!
Hello, When I read your paper and reproduced the code, I had a question. You mentioned in your paper: we fuse the outputs from two adjacent frames with an adaptive aggregation algorithm. As shown in the red box below:
The implementation of this part in the code is as follows:
My question is as follows: last_output_embed represents the output of the previous frame, why is it not
tracks[b-1].last_output
buttracks[b].last_output
. I'm sorry to bother you again. If I have any misunderstanding, please advise me.