facebookresearch / sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Apache License 2.0
12.45k stars 1.15k forks source link

Multiple bounding box as prompt #267

Open YangJae96 opened 2 months ago

YangJae96 commented 2 months ago

Hi. Thank you for great work!

I was wondering if it is possible to give multiple bounding boxes as a prompt.

In the demo code, it just give only 1 box as prompt and uses predictor.add_new_points_or_box to add the box.

Is it possible to give two boxes at once like below code?

boxes = np.array([[300, 0, 500, 400],
                [180, 140,290,400]], dtype=np.float32)

_, out_obj_ids, out_mask_logits = predictor.add_new_points_or_box(
    inference_state=inference_state,
    frame_idx=ann_frame_idx,
    obj_id=ann_obj_id,
    box=boxes,
)

Thanks in advance.

AlexMcClay commented 2 months ago

Yes it is possible.

if you do something like this

boxes = np.array([[300, 0, 500, 400],
                [180, 140,290,400]])

masks, scores, logits = predictor.predict(
                    box=boxes,
                    multimask_output=False,
                )
print("Masks: ", masks.shape)

the output will be something like this.

Masks: (2, 1, 512, 512)

basically (n, 1, image width, image height) where n is the number of boxes.

That works for me, and im going off of the example image notebook

https://github.com/facebookresearch/segment-anything-2/blob/main/notebooks/image_predictor_example.ipynb

They explain it in the "Batched prompt inputs" section

ZhilunT commented 2 weeks ago

Hello,i am using the predictor._predict function for prediction, input_points contains 86 points, and input_bbox contains 2 bounding boxes, as there are multiple points within 2 boxes.

masks, scores, logits = predictor._predict( point_coords = input_points, point_labels = np.ones([input_points.shape[0],1]), box=input_bbox )

The goal is to use both points and bounding boxes for prediction simultaneously. However, the points and bounding boxes may not be equal in number.

The error mentioned above occurs because the current implementation expects the number of points and bounding boxes to match. This works fine if the number of input_box is set to match the number of points, but in practice, a single bounding box may contain multiple points.

How can this issue be resolved to handle cases where a bounding box contains multiple points?