Can we use boxes as prompts for video tracking?

facebookresearch / sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Apache License 2.0

11.92k stars 1.06k forks source link

Can we use boxes as prompts for video tracking? #27

Open OliverHxh opened 2 months ago

OliverHxh commented 2 months ago

Thanks for your great work. I have read the example code, but I didn’t find how to adopt bounding boxes as prompts to track objects in videos. Is there anyway to do that?

Thank you so much!

weixi-feng commented 2 months ago

I think you just need to modify add_new_points such that points and labels are obtained from your bounding box coordinates. Check the SAM2ImagePredictor on how to preprocess the bounding box inputs.

OliverHxh commented 2 months ago

@weixi-feng Thanks for your hints. Very useful!

rentainhe commented 2 months ago

@OliverHxh We've also noticed this issue when we implement Grounded SAM 2, we try the following method to get a stable segmentation result based on box prompts:

Firstly, we use SAM2 Image Predictor to predict the mask based on the object box prompt
Then we can uniformly sample positive points from the predictor mask to a more stable segment results