facebookresearch / segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Apache License 2.0
46.55k stars 5.52k forks source link

Multiple Boxes as prompt #381

Open kabbas570 opened 1 year ago

kabbas570 commented 1 year ago

Hello, thanks for the nice work; well done. Can the SAM take multiple Bounding boxes as prompts for segmentation? For example, if I draw boxes around two objects, say a building and a dog, image The SAM only segments the one as

masks, scores, logits = mask_predictor.predict( box=box, multimask_output=True

Here, it expects the size of the box as [1,4]. If it becomes [2,4], it raises this error.

157 if boxes is not None: 158 box_embeddings = self._embed_boxes(boxes) --> 159 sparse_embeddings = torch.cat([sparse_embeddings, box_embeddings], dim=1) 160 161 if masks is not None: RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 1 in the list.

Not sure how to change the size of sparse_embeddings as well to [2,4]!!!

Thank you Cheers Abbas

0vl0 commented 1 year ago

You can use predict_torch to give multiple bounding boxes as input prompts:

input_boxes = torch.tensor([box_1, box_2], device=mask_predictor.device)  
transformed_boxes = mask_predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])  
masks, iou_predictions, low_res_masks = mask_predictor.predict_torch(
    point_coords=None,
    point_labels=None,
    boxes=transformed_boxes,
    multimask_output=True
)
Dipankar1997161 commented 1 year ago

You can use predict_torch to give multiple bounding boxes as input prompts:

input_boxes = torch.tensor([box_1, box_2], device=mask_predictor.device)  
transformed_boxes = mask_predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])  
masks, iou_predictions, low_res_masks = mask_predictor.predict_torch(
    point_coords=None,
    point_labels=None,
    boxes=transformed_boxes,
    multimask_output=True
)

Hello, I used the .predict function and got the alpha masks for my image. Is there any option to generate the segmentation Image as the output, for example the Dog cropped out in a separate image.

I tried the bitwise_And operation in opencv but the output was not that clean. So, is there any setting in SAM to do the same

0vl0 commented 1 year ago

Is there any option to generate the segmentation Image as the output, for example the Dog cropped out in a separate image. This option is not supported by SAM. You have to post-process the output mask.

If your method didn't work, you can extract the dog using the bounding box of the mask:

image = cv2.imread(path_image)
masks, _, _ = predictor.predict(point_coords=points, point_labels=labels, multimask_output=False)  
Y, X = masks[0].nonzero()  
left, right, top, bottom = min(X), max(X), min(Y), max(Y)  
dog_image = image[top:bottom, left:right]

dog|dog_cropped

kulkarnikeerti commented 1 year ago

Hi @kabbas570 @0vl0
I am struggling to understand the format of the bounding box. Is it yolo format (x-center, y-center, w, h) or coco format (xmin, ymin, w, h). Since you are already able to extract masks, it would be really helpful if you could clear this for me. Thanks in advance

kabbas570 commented 1 year ago

@kulkarnikeerti here is the format,

default_box is going to be used if you will not draw any box on image above

default_box = {'x': 68, 'y': 247, 'width': 555, 'height': 678, 'label': ''}

kabbas570 commented 1 year ago

You can use predict_torch to give multiple bounding boxes as input prompts:

input_boxes = torch.tensor([box_1, box_2], device=mask_predictor.device)  
transformed_boxes = mask_predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])  
masks, iou_predictions, low_res_masks = mask_predictor.predict_torch(
    point_coords=None,
    point_labels=None,
    boxes=transformed_boxes,
    multimask_output=True
)

I was able to solve the issue by iterating through he boxes, here is a sample code,

import numpy as np import cv2 import numpy as np import matplotlib.pyplot as plt import supervision as sv default_box = {'x': 68, 'y': 247, 'width': 555, 'height': 678, 'label': ''}

image_bgr = cv2.imread(IMAGE_PATH) image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB) combine = np.zeros(image_rgb.shape) print(len(widget.bboxes)) for i in range(len(widget.bboxes)): box = widget.bboxes[i] if widget.bboxes else default_box box = np.array([ box['x'], box['y'], box['x'] + box['width'], box['y'] + box['height'] ])

mask_predictor.set_image(image_rgb)

masks, scores, logits = mask_predictor.predict( box=box, multimask_output=False )

masks=np.transpose(masks,[2,1,0]) masks=np.transpose(masks,[1,0,2]) combine[np.where(masks==1)]=1

plt.figure() plt.imshow(combine)

kulkarnikeerti commented 1 year ago

@kulkarnikeerti here is the format,

default_box is going to be used if you will not draw any box on image above

default_box = {'x': 68, 'y': 247, 'width': 555, 'height': 678, 'label': ''}

@kabbas570 Thanks for the reply. But, I am still confused. My question was what these x and y were? xmin, ymin or x-center, y-center. Because I have my bounding box values defined, which I tried to use in demo notebook to get a segmentation based on bounding box. The problem here is, the bounding box doesn't exactly fit the object. I have bounding box in yolo format (x-center, y-center). That's why I wanted to understand what format the code uses. I tried converting from either way, but nothing works for now.

0vl0 commented 1 year ago

Hi @kulkarnikeerti, the input bounding box format to predict is xyxy (left, top, right, bottom).

kulkarnikeerti commented 1 year ago

@0vl0 Thanks. Got it!

GewelsJI commented 1 year ago

Hi, everyone,

what's the difference between apply_boxes and apply_boxes_torch in file segment_anything/utils/transforms.py?

emi-dm commented 11 months ago

Is there any way to use an indeterminate number of points for each bounding box, without the need for each bounding box to have exactly the same points? For example, create two bounding boxes, in which the first one has a single foreground point and the second one has a foreground point and a background point.

The fixed dimension of the torch tensioners does not allow it, could you give me a small example of how to do it?

Thanks in advanced!

image

image

shrutichakraborty commented 9 months ago

Hi all!

I'd like to use multiple boxes and multiple points as input to predict the masks. IHowever, I'm getting a shape error when I try that. The code I have been trying is :

input_box = torch.tensor(input_box)
        input_box = predictor.transform.apply_boxes_torch(input_box, image.shape[:2])
        if input_point is not None : 
            input_point = torch.as_tensor(input_point, dtype=torch.float)
            input_label = torch.as_tensor(input_label, dtype=torch.int)

            input_point = predictor.transform.apply_coords_torch(input_point,image.shape[:2])
            # input_label = predictor.transform.apply_coords_torch(input_label,image.shape[:2])
            print("labels_torch:",input_label.shape)
            input_point, input_label = input_point[None, :, :], input_label[None, :]
            print("coords_torch:",input_point.shape)
            print("labels_torch:",input_label.shape)

        masks, _, _ = predictor.predict_torch(
        point_coords=input_point,
        point_labels=input_label,
        boxes=input_box,
        multimask_output=False,
        )

The error message I get is :

image

I am using the predict_torch method, so I had a look at the predictor.py file, which requires that point_labels is a BxN torch tensor and point_coordinates is a BxNx2``. Here, I am not sure what B is, but N I assume is the number of points clicked. As I am usingpredict_torchmethod directly without first using thepredictor.predict` method, I also ensured to convert input_labels and input_points to tensors and the shapes that I get are: torch.Size([1, 5]) and ([1, 5, 2]) respectively .

Can someone help me out? Thanks!

ritchi1408 commented 9 months ago

I got the Same issue but only if i use 2 or more boxes:

works:

points = np.array([[1375,1625],[760,1230]])
input_points = torch.tensor(points, device=predictor.device)
labels = np.array([1,1])
boxes = np.array([[1300, 1550, 1450, 1750]])
input_boxes = torch.tensor(boxes, device=predictor.device)
input_label = torch.tensor(labels, device=predictor.device)
print("Dimensions of points:", input_points.shape) #Dimensions of points: torch.Size([2, 2])
print("Dimensions of boxes:", input_boxes.shape) # Dimensions of boxes: torch.Size([1, 4])

transformed_boxes = predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])
transformed_coords = predictor.transform.apply_coords_torch(input_points, image.shape[:2])
transformed_coords = transformed_coords[None, :, :]
input_label = input_label[None, :]

print(transformed_coords.shape) # torch.Size([1, 2, 2])
print(transformed_boxes.shape) # torch.Size([1, 4])

masks, _, _ = predictor.predict_torch(
    point_coords=transformed_coords,
    point_labels=input_label,
    boxes=transformed_boxes,
    multimask_output=False,
)

works:

transformed_coords = None
input_label = None
boxes = np.array([[1300, 1550, 1450, 1750],[755, 1150, 910, 1310]])
input_boxes = torch.tensor(boxes, device=predictor.device)
transformed_boxes = predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])
masks, _, _ = predictor.predict_torch(
    point_coords=transformed_coords,
    point_labels=input_label,
    boxes=transformed_boxes,
    multimask_output=False,
)

not working:


points = np.array([[1375,1625],[760,1230]])
input_points = torch.tensor(points, device=predictor.device)
boxes = np.array([[1300, 1550, 1450, 1750],[755, 1150, 910, 1310]])
input_boxes = torch.tensor(boxes, device=predictor.device)
input_label = torch.tensor(labels, device=predictor.device)
print("Dimensions of points:", input_points.shape) #Dimensions of points: torch.Size([2, 2])
print("Dimensions of boxes:", input_boxes.shape) #Dimensions of boxes: torch.Size([2, 4])

transformed_boxes = predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])
transformed_coords = predictor.transform.apply_coords_torch(input_points, image.shape[:2])
transformed_coords = transformed_coords[None, :, :]
input_label = input_label[None, :]

print(transformed_coords.shape) # torch.Size([1, 2, 2])
print(transformed_boxes.shape) # torch.Size([2, 4])

masks, _, _ = predictor.predict_torch(
    point_coords=transformed_coords,
    point_labels=input_label,
    boxes=transformed_boxes,
    multimask_output=False,
)

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 1 in the list.

shrutichakraborty commented 9 months ago

I got the Same issue but only if i use 2 or more boxes:

works:

points = np.array([[1375,1625],[760,1230]])
input_points = torch.tensor(points, device=predictor.device)
labels = np.array([1,1])
boxes = np.array([[1300, 1550, 1450, 1750]])
input_boxes = torch.tensor(boxes, device=predictor.device)
input_label = torch.tensor(labels, device=predictor.device)
print("Dimensions of points:", input_points.shape) #Dimensions of points: torch.Size([2, 2])
print("Dimensions of boxes:", input_boxes.shape) # Dimensions of boxes: torch.Size([1, 4])

transformed_boxes = predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])
transformed_coords = predictor.transform.apply_coords_torch(input_points, image.shape[:2])
transformed_coords = transformed_coords[None, :, :]
input_label = input_label[None, :]

print(transformed_coords.shape) # torch.Size([1, 2, 2])
print(transformed_boxes.shape) # torch.Size([1, 4])

masks, _, _ = predictor.predict_torch(
    point_coords=transformed_coords,
    point_labels=input_label,
    boxes=transformed_boxes,
    multimask_output=False,
)

works:

transformed_coords = None
input_label = None
boxes = np.array([[1300, 1550, 1450, 1750],[755, 1150, 910, 1310]])
input_boxes = torch.tensor(boxes, device=predictor.device)
transformed_boxes = predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])
masks, _, _ = predictor.predict_torch(
    point_coords=transformed_coords,
    point_labels=input_label,
    boxes=transformed_boxes,
    multimask_output=False,
)

not working:


points = np.array([[1375,1625],[760,1230]])
input_points = torch.tensor(points, device=predictor.device)
boxes = np.array([[1300, 1550, 1450, 1750],[755, 1150, 910, 1310]])
input_boxes = torch.tensor(boxes, device=predictor.device)
input_label = torch.tensor(labels, device=predictor.device)
print("Dimensions of points:", input_points.shape) #Dimensions of points: torch.Size([2, 2])
print("Dimensions of boxes:", input_boxes.shape) #Dimensions of boxes: torch.Size([2, 4])

transformed_boxes = predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])
transformed_coords = predictor.transform.apply_coords_torch(input_points, image.shape[:2])
transformed_coords = transformed_coords[None, :, :]
input_label = input_label[None, :]

print(transformed_coords.shape) # torch.Size([1, 2, 2])
print(transformed_boxes.shape) # torch.Size([2, 4])

masks, _, _ = predictor.predict_torch(
    point_coords=transformed_coords,
    point_labels=input_label,
    boxes=transformed_boxes,
    multimask_output=False,
)

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 1 in the list.

I got the Same issue but only if i use 2 or more boxes:

works:

points = np.array([[1375,1625],[760,1230]])
input_points = torch.tensor(points, device=predictor.device)
labels = np.array([1,1])
boxes = np.array([[1300, 1550, 1450, 1750]])
input_boxes = torch.tensor(boxes, device=predictor.device)
input_label = torch.tensor(labels, device=predictor.device)
print("Dimensions of points:", input_points.shape) #Dimensions of points: torch.Size([2, 2])
print("Dimensions of boxes:", input_boxes.shape) # Dimensions of boxes: torch.Size([1, 4])

transformed_boxes = predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])
transformed_coords = predictor.transform.apply_coords_torch(input_points, image.shape[:2])
transformed_coords = transformed_coords[None, :, :]
input_label = input_label[None, :]

print(transformed_coords.shape) # torch.Size([1, 2, 2])
print(transformed_boxes.shape) # torch.Size([1, 4])

masks, _, _ = predictor.predict_torch(
    point_coords=transformed_coords,
    point_labels=input_label,
    boxes=transformed_boxes,
    multimask_output=False,
)

works:

transformed_coords = None
input_label = None
boxes = np.array([[1300, 1550, 1450, 1750],[755, 1150, 910, 1310]])
input_boxes = torch.tensor(boxes, device=predictor.device)
transformed_boxes = predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])
masks, _, _ = predictor.predict_torch(
    point_coords=transformed_coords,
    point_labels=input_label,
    boxes=transformed_boxes,
    multimask_output=False,
)

not working:


points = np.array([[1375,1625],[760,1230]])
input_points = torch.tensor(points, device=predictor.device)
boxes = np.array([[1300, 1550, 1450, 1750],[755, 1150, 910, 1310]])
input_boxes = torch.tensor(boxes, device=predictor.device)
input_label = torch.tensor(labels, device=predictor.device)
print("Dimensions of points:", input_points.shape) #Dimensions of points: torch.Size([2, 2])
print("Dimensions of boxes:", input_boxes.shape) #Dimensions of boxes: torch.Size([2, 4])

transformed_boxes = predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])
transformed_coords = predictor.transform.apply_coords_torch(input_points, image.shape[:2])
transformed_coords = transformed_coords[None, :, :]
input_label = input_label[None, :]

print(transformed_coords.shape) # torch.Size([1, 2, 2])
print(transformed_boxes.shape) # torch.Size([2, 4])

masks, _, _ = predictor.predict_torch(
    point_coords=transformed_coords,
    point_labels=input_label,
    boxes=transformed_boxes,
    multimask_output=False,
)

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 1 in the list.

Hi! Look at the issue : https://github.com/facebookresearch/segment-anything/issues/620 :)

nhw649 commented 8 months ago

You can use predict_torch to give multiple bounding boxes as input prompts:

input_boxes = torch.tensor([box_1, box_2], device=mask_predictor.device)  
transformed_boxes = mask_predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])  
masks, iou_predictions, low_res_masks = mask_predictor.predict_torch(
    point_coords=None,
    point_labels=None,
    boxes=transformed_boxes,
    multimask_output=True
)

hello, I use this code, but it failed.

Sizes of tensors must match except in dimension 0. Got 1 and 2 (The offending index is 0)

from predictor import SamPredictor
# from utils.utils import show_mask, show_box, show_points
sam = sam_model_registry['vit_b'](checkpoint='/home/nhw/omni/checkpoints/sam_vit_b_01ec64.pth').cuda()
mask_predictor = SamPredictor(sam)

# image upload
img = np.array(Image.open("figure/dog.jpg"))
mask_predictor.set_image(img)
input_boxes = torch.tensor([[200, 200, 600, 600],[200, 200, 600, 600]], device=mask_predictor.device)  # x1,y1,x2,y2
transformed_boxes = mask_predictor.transform.apply_boxes_torch(input_boxes, img.shape[:2])

masks, scores, logits = mask_predictor.predict_torch(
    point_coords=None,
    point_labels=None,
    boxes=transformed_boxes,
    multimask_output=True
)
ritchi1408 commented 8 months ago

So this is my Solution:

input_points = []

input_boxes = []
input_label = []

for groupedPoints in groupedPointsByBoxes.items():
    pointsByBoxForSegmentation = []
    labelsByBoxForSegmentation = []
    for point in groupedPoints[1]:
        pointsByBoxForSegmentation.append([point.Point.x, point.Point.y])
        labelsByBoxForSegmentation.append(int(point.MaskPoint))

    input_boxes.append(groupedPoints[0])
    input_points.append(pointsByBoxForSegmentation)
    input_label.append(labelsByBoxForSegmentation)

self.predictor.set_image(image)

input_points = np.array(input_points)
transformed_coords = torch.tensor(input_points, device=self.predictor.device)
transformed_coords = self.predictor.transform.apply_coords_torch(transformed_coords, image.shape[:2])

input_boxes = torch.tensor(input_boxes, device=self.predictor.device)
transformed_boxes = self.predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])

input_label = torch.tensor(np.array(input_label), device=self.predictor.device)

masks, scores, logits = self.predictor.predict_torch(
    point_coords=transformed_coords,
    point_labels=input_label,
    # point_coords=None,
    # point_labels=None,
    boxes=transformed_boxes,
    # mask_input=mask_input[None, :, :],
    multimask_output=False
)

groupedPointsByBoxes is a dictionary with [minX,minY,maxX,maxY] as key and points with the labels (0 or 1)

you can also leave out the points and labels it should also work with boxes only

i hope this one helps

nhw649 commented 8 months ago

So this is my Solution:

input_points = []

input_boxes = []
input_label = []

for groupedPoints in groupedPointsByBoxes.items():
    pointsByBoxForSegmentation = []
    labelsByBoxForSegmentation = []
    for point in groupedPoints[1]:
        pointsByBoxForSegmentation.append([point.Point.x, point.Point.y])
        labelsByBoxForSegmentation.append(int(point.MaskPoint))

    input_boxes.append(groupedPoints[0])
    input_points.append(pointsByBoxForSegmentation)
    input_label.append(labelsByBoxForSegmentation)

self.predictor.set_image(image)

input_points = np.array(input_points)
transformed_coords = torch.tensor(input_points, device=self.predictor.device)
transformed_coords = self.predictor.transform.apply_coords_torch(transformed_coords, image.shape[:2])

input_boxes = torch.tensor(input_boxes, device=self.predictor.device)
transformed_boxes = self.predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])

input_label = torch.tensor(np.array(input_label), device=self.predictor.device)

masks, scores, logits = self.predictor.predict_torch(
    point_coords=transformed_coords,
    point_labels=input_label,
    # point_coords=None,
    # point_labels=None,
    boxes=transformed_boxes,
    # mask_input=mask_input[None, :, :],
    multimask_output=False
)

groupedPointsByBoxes is a dictionary with [minX,minY,maxX,maxY] as key and points with the labels (0 or 1)

you can also leave out the points and labels it should also work with boxes only

i hope this one helps

ok, I will try. Thanks.

315386775 commented 8 months ago

The Tutorial under path notebooks/predictor_example.ipynb. Batched prompt inputs.

Preburk commented 3 months ago

Hello, I tried both iterating through each of my boxes and get a mask for each which works very well but is pretty slow, I now batch the input by sending in all those boxes. The problem is that i now get some masks which are merged with eachother which i did not get before, is there any way to get the same functionality as sending the prompts one by one?