SysCV / sam-pt

SAM-PT: Extending SAM to zero-shot video segmentation with point-based tracking.
https://arxiv.org/abs/2307.01197
Apache License 2.0
950 stars 60 forks source link

Cab sam-pt automatically track? #29

Closed jimmylihui closed 3 months ago

jimmylihui commented 9 months ago

Hi, Can I ask if sam-pt can automatically track?

m43 commented 8 months ago

Hi, thank you for your question. Yes, SAM-PT can track an object that is defined in the initial frame. It outputs segmentation masks for all subsequent frames in the video. For the first frame, the target object is defined using "query" points. Have I understood your question well, I'm not sure what you might have referred to with "automatically"?

rruiz-s commented 7 months ago

Hi, thank you so much for sharing SAM-PT and your explanation.

I've experiment with SAM-PT and as @m43 kindly explained, SAM-PT uses the points from the first frame to track the object automatically in all subsequent frames of the video.

Maybe related to @jimmylihui initial question, I was wondering if there is a way to include new query points for subsequent frames in the video while keeping the initial query points. In my case, I had some problems because new objects that were not in the first frame appeared later in the video. Therefore, while the objects from the first frame were being tracked automatically , the new objects from subsequent frames were not.

Again, congratulations on the work and thank you for sharing.

Edited: I shortened the comment to remain within the limits of the initial question

m43 commented 7 months ago

Thanks for the question!

Yes, the model supports being passed query points with arbitrary and varying timesteps for the same mask (see here). The inputted query points are defined as a tensor of shape (num_masks, n_points_per_mask, 3), with each element denoting the (t, x, y) (timestep, x location, y location) of the query point. For example, if you track one mask with (1920, 360) at timestep 0 and (960, 720) as timestep 2, then you would have a query points tensor like torch.tensor([ [[0, 1920,360], [2, 960,720]] ]).

However, this functionality hasn't been utilized in the simple demo where I fixed the query timestep of all points to 0 here for simplicity. Maybe you want to adapt (or contribute) the code in the way necessary for your use case, or perhaps I could update the demo sometime.

rruiz-s commented 7 months ago

Thank you very much @m43 for your clear and detailed answer!

I feel it gives relevant information for this thread, particularly regarding the timestep element of the query points and its possibilities.