[Project] Grounded SAM 2 Release

rentainhe commented 1 month ago

We combine Grounding DINO, Grounding DINO 1.5 and SAM 2 for tracking any object in the input video and we've open-sourced our code here: Grounded SAM 2

In this repo, we've supported:

Segment Anything based on Grounding DINO, Grounding DINO 1.5 box predictor
Track Anything based on Grounding DINO and SAM 2 video predictor
We've implemented the code as simply as possible (less than 150 lines of code), and we hope it can bring more convenience to the community.
We've noticed that the SAM 2 video predictor does not support box prompts now, so we've implement a simple uniformly positive point prompts sampling method based on the SAM 2 image predictor to support box prompts in video tracking demo, refer to our code for more details

We will update our code in the future release to support more demos for better usage.

A simple tracking video demo is as follows:

https://github.com/user-attachments/assets/8ebfa5de-3eac-43c5-b8e2-49160c9df786

ronghanghu commented 1 month ago

Hi @rentainhe, thanks for the great work!

Regarding the box prompts

We've noticed that the SAM 2 video predictor does not support box prompts now, so we've implement a simple uniformly positive point prompts sampling method based on the SAM 2 image predictor to support box prompts in video tracking demo, refer to our code for more details

We just added an example in the video predictor notebook in https://github.com/facebookresearch/segment-anything-2/pull/174 to provide a box prompt example. Maybe we could directly use the box prompt in this case?

rentainhe commented 1 month ago

Hi ronghang! We've already updated SAM 2 to the latest version and support box/point/mask prompts in video object tracking demo!

ronghanghu commented 1 month ago

@rentainhe Great, thanks for the quick update!

facebookresearch / segment-anything-2

[Project] Grounded SAM 2 Release #130