MCG-NJU / MeMOTR

[ICCV 2023] MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking
https://arxiv.org/abs/2307.15700
MIT License
140 stars 8 forks source link

question about short_memory #5

Closed BelieveF closed 9 months ago

BelieveF commented 9 months ago

Hello, When I read your paper and reproduced the code, I had a question. You mentioned in your paper: we fuse the outputs from two adjacent frames with an adaptive aggregation algorithm. As shown in the red box below: S$JR}7L`P {M31@ OD}DAM7

The implementation of this part in the code is as follows: %}JV9OB{J 0WVSJN 4Q))EU

My question is as follows: last_output_embed represents the output of the previous frame, why is it not tracks[b-1].last_outputbut tracks[b].last_output. I'm sorry to bother you again. If I have any misunderstanding, please advise me.

HELLORPG commented 9 months ago

In last_output_embed = tracks[b].last_output, the b is the corresponding batch, not the time step. In our experiments, as the batch size is set to 1, the b is 0 all the time. We update the last_output in the later code.

BelieveF commented 9 months ago

In last_output_embed = tracks[b].last_output, the b is the corresponding batch, not the time step. In our experiments, as the batch size is set to 1, the b is 0 all the time. We update the last_output in the later code.

Tanks for your reply!now i have saved my question!