Recommended way to add new points to track later in the video?

Caspeerrr commented 3 months ago

In my usecase I don't know all objects I want to track in the first frame, but they may occur at any frame in the video. So I need to be able to flexibly add new points/bboxes to track. As far as I can see there are currently two obvious approaches:

When adding a new object to track I can reset the inference state. I can then add the new objects, together with the masks of the objects I'm already tracking. This doesn't seem ideal as the masks of the objects I was already tracking lose all context of the previous frames
Create a separate inference state for each object I want to track. Advantage is that all objects are kept separate, allowing for full context. Disadvantage I think is that this creates a lot of overhead decreasing the inference speed.

Does anyone have any additional insights?

Thanks!

ronghanghu commented 3 months ago

Hi @Caspeerrr, I would recommend using the 2nd approach of "Create a separate inference state for each object I want to track" as you mentioned above.

Currently the codebase doesn't support adding new objects after tracking, primarily because it performs inference by batching multiple objects together, while new objects added later don't have memory or other previous states and cannot be directly batched together. Tracking them with separate inference states could be a workaround to this issue.

melodyhappy commented 2 months ago

Can we avoid initializing a new predictor and instead directly add new objects during the tracking process, such as aligning the information of newly appeared objects with the existing ones by padding in temporal dimensions?

facebookresearch / sam2

Recommended way to add new points to track later in the video? #224