facebookresearch / sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Apache License 2.0
12.68k stars 1.19k forks source link

Question about multi points and boxes as prompt #435

Open ZhilunT opened 3 weeks ago

ZhilunT commented 3 weeks ago

Hello,i am using the predictor._predict function for prediction, input_points contains 86 points, and input_bbox contains 2 bounding boxes, as there are multiple points within 2 boxes.

masks, scores, logits = predictor._predict( point_coords = input_points, point_labels = np.ones([input_points.shape[0],1]), box=input_bbox ) The goal is to use both points and bounding boxes for prediction simultaneously. However, the points and bounding boxes may not be equal in number.

The error mentioned above occurs because the current implementation expects the number of points and bounding boxes to match. This works fine if the number of input_box is set to match the number of points, but in practice, a single bounding box may contain multiple points.

How can this issue be resolved to handle cases where a bounding box contains multiple points?

` masks, scores, logits = predictor.predict( ^^^^^^^^^^^^^^^^^^ File "sam2/sam2_image_predictor.py", line 271, in predict masks, iou_predictions, low_res_masks = self._predict( ^^^^^^^^^^^^^^ File "/root/miniconda3/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "sam2/sam2_image_predictor.py", line 384, in _predict concat_coords = torch.cat([box_coords, concat_points[0]], dim=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 2 but got size 86 for tensor number 1 in the list.

`

heyoeyo commented 2 weeks ago

The short answer is that having more than 1 box per prompt requires code changes, but the model doesn't seem to handle this well. Having a single box & many points should work however, as long as the 'N' points are in the 1st dimension slot (i.e. the shape of the given points should be: BxNx2, where B is batch size, N is number of points and 2 is for (x,y) coordinates). This is discussed in more detail in issue #235.