Can we use boxes as prompts for video tracking?

facebookresearch / segment-anything-2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Apache License 2.0

10.7k stars 860 forks source link

Can we use boxes as prompts for video tracking? #27

Open OliverHxh opened 1 month ago

OliverHxh commented 1 month ago

Thanks for your great work. I have read the example code, but I didn’t find how to adopt bounding boxes as prompts to track objects in videos. Is there anyway to do that?

Thank you so much!

weixi-feng commented 1 month ago

I think you just need to modify add_new_points such that points and labels are obtained from your bounding box coordinates. Check the SAM2ImagePredictor on how to preprocess the bounding box inputs.

OliverHxh commented 1 month ago

@weixi-feng Thanks for your hints. Very useful!

rentainhe commented 1 month ago

@OliverHxh We've also noticed this issue when we implement Grounded SAM 2, we try the following method to get a stable segmentation result based on box prompts:

Firstly, we use SAM2 Image Predictor to predict the mask based on the object box prompt
Then we can uniformly sample positive points from the predictor mask to a more stable segment results