autodistill / autodistill-grounded-sam-2

Use Segment Anything 2, grounded with Florence-2, to auto-label data for use in training vision models.
https://docs.autodistill.com
Apache License 2.0
95 stars 13 forks source link

Video support for SAM2 #8

Open lab176344 opened 1 month ago

lab176344 commented 1 month ago

https://imgur.com/a/jIxYTMf

Added the ability to track objects over time with SAM2 and grounding DINO, the idea of it from IDEA-Research Grounded SAM implementation.

lab176344 commented 1 month ago

output_video.mp4 Added the ability to track objects over time with SAM2 and grounding DINO, the idea of it from IDEA-Research Grounded SAM implementation.

Approach is based on the idea of running SAM2 Image predictor on each frame and propogating the mask and box prompts. A global data structure is tracking frame by frame change of ids, Ids are combined if the IOU between masks are greater than a threshold, and new Ids are added. It is an expensive approach as you have to run the image propogation is SAM2 for every frame.

lab176344 commented 1 month ago

@capjamesg let me know what you think