MCG-NJU / MOTIP

Multiple Object Tracking as ID Prediction
https://arxiv.org/abs/2403.16848
Apache License 2.0
82 stars 7 forks source link

If the target disappears for a long time and then reappears, can the algorithm recover it? #20

Open LuletterSoul opened 1 month ago

LuletterSoul commented 1 month ago

Hello, thank you for MCG for open sourcing this amazing MOT work! As the issue mentioned in the title, in a sequence, the target disappears from the camera for a long time and then reappears, I found that the target is treated as a newborn and given a new ID, which does not actually align with real-world usage scenarios.

Can historical identity IDs of disappearing targets be recorded, so that the targets can be found again when they reappear?

HELLORPG commented 1 month ago

Thanks for your thumbs up. This is a pretty interesting question, and I will try to explain my thoughts as briefly as I can.

The phenomenon you observed is because the long time you mentioned exceeds what the model can allow (determined by MAX_TEMPORAL_LENGTH). This is due to the limitations of the relative position embedding we use. The video frames we can see during training are limited, so we cannot support an infinite sequence length (in other words, a long-time occlusion). From a utilitarian perspective, because such long-term occlusion is relatively rare on MOT benchmarks, we do not consider this issue by default.

But I have two ideas that may be able to handle the situation you mentioned:

  1. A plug-and-play ReID module can be used to handle these long-disappearing targets. This is very common in practical applications because long-term occlusions are difficult to construct in training data. Many manual algorithms, including ByteTrack/OC-SORT, can only handle occlusions of dozens of frames.
  2. Add a tricky implementation to our model. Our (mostly) inference code is here. In our implementation, trajectories are a queue stored by time, initialized here and updated here. Therefore, we eliminate objects in the furthest video frame over time. If an ID does not exist in all frames (trajectory_history in our code), then even if the same target appears later, it will be regarded as a newborn object (in your words, the target disappears from the camera for a long time and then reappears). So an intuitive idea is: over time, we all retrain at least one item for each ID. Specifically, before the queue is updated, you can check whether there is an ID in the abandoned frame that does not exist in other remaining frames. If it is, you could save it manually to the previous frame (save it from trajectory_history[0] to trajectory_history[1], then call trajectory_history.append(current_tracks)). In this way, this ID will not be lost when the trajectory_history[0] is popped while retaining the possibility of recovering it in future frames. However, I'm not sure about the effectiveness of this process, as it goes beyond what the model sees during training and may introduce gaps. In addition, I think that if you use this trick, you still need to limit the maximum disappearance time of an ID; otherwise, it may cause some unexpected problems.

My expression may be a bit complicated. If you don't understand it, fell free to reply and discussion.