IDEA-Research / Grounded-SAM-2

Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
https://arxiv.org/abs/2401.14159
Apache License 2.0
703 stars 48 forks source link

prompt type for video issue #28

Open ZhangT-tech opened 3 weeks ago

ZhangT-tech commented 3 weeks ago

Hi, thanks for the great work!

I am trying to obtain the point tracking results instead of segmentation masks, even though I substitute the PROMPT_TYPE_FOR_VIDEO to "point", the output video and tracking results are still segmentation masks. Is there any way to obtain the points instead of masks?

Thank you in advance!

rentainhe commented 3 weeks ago

Hi @ZhangT-tech

the hyperparam PROMPT_TYPE_FOR_VIDEO="point" means we're using point prompts to segment objects instead of mask and box prompts, it doesn't return the point tracking results. If you want to track any point in video, you can try our TAPTR model.

ZhangT-tech commented 3 weeks ago

Thank you very much, also I checked that the point is uniformly sampled from the mask, is there any way we can obtain the sampled data point on the mask?

rentainhe commented 3 weeks ago

Thank you very much, also I checked that the point is uniformly sampled from the mask, is there any way we can obtain the sampled data point on the mask?

The sampled point is in (x, y) format, I think you can directly save it, but we only sample points on the first frame, and the points are just used as prompts for promptable segmentation, so we can not get the tracking results of each point in the following frames.