facebookresearch / segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Apache License 2.0
46.65k stars 5.53k forks source link

rotated bounding boxes as input? #200

Open davidsvaughn opened 1 year ago

davidsvaughn commented 1 year ago

I see SAM is already able to take horizontal (normal) bounding boxes as input. Would it be hard to adapt SAM to be able to take rotated bounding boxes as input? Any thoughts as to how I might get started adding this ability would be appreciated. I have several large datasets with rotated boxes that I wish to segment. Of course I could just draw normal boxes around the rotated ones, and use that as input, but a lot of information would be lost, and the resulting masks would surely not be as good. Using the rotated boxes as input would be ideal.

Jordan-Pierce commented 1 year ago

If you cannot use rotated bounding boxes, one thing I was looking at was combining prompts (boxes and points), as sometimes the boxes alone do not create complete masks (depending on the object). My idea was to randomly sample points within the bounding box coordinates, and provide both the box, and N points to create a mask for that object. In your situation, if this technique works, you could use the normal bounding box, but randomly sample N points within the rotated box region. Just a thought.

torinchen commented 11 months ago

the methodology works like shit, the negtive points dont work this way:)

halqadasi commented 4 months ago

@Jordan-Pierce I am just thinking about since SAM accepts a mask (polygon) as a prompt, what about if we can use a rotated bounding box (R-BB) as a polygon or mask prompt?

These are the steps: Convert R-BB to Polygon: First, you need to convert the coordinates of the rotated bounding box into a format that SAM can understand as a mask. This typically means converting the four corners of the R-BB into a polygonal representation. Each corner of the R-BB can be defined by its x and y coordinates, creating a closed polygon that outlines the area of interest.

Create Mask from Polygon: Once you have the polygon, you can create a binary mask where pixels inside the polygon are set to 1 (indicating the object of interest) and pixels outside are set to 0. This mask then serves as the input prompt for SAM.

Input Mask to SAM: With the binary mask ready, you can feed it into SAM as an input mask. The model will use this mask to focus its segmentation process on the area defined by the R-BB, effectively using the rotated box as a guide for segmentation.