facebookresearch / sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Apache License 2.0
12.54k stars 1.17k forks source link

How to get convex masks from sam2 predictor using bounding box prompts? #455

Open ovalerio opened 4 days ago

ovalerio commented 4 days ago

Hello SAM2 Team,

Thank you for making SAM2 available. It is an amazing piece of software. I am currently using the model to track a worm head. SAM2 is helping me to seed the masks for my custom segmentation model. Unfortunately my images are a little unsharp so I am not getting convex masks that I can later use for training a custom network. I think sharing an image would explain it better.

image

Do you have any suggestions on getting convex binary masks from SAM2 that I can use for my pipeline?

Thanks again!

heyoeyo commented 4 days ago

I don't know that there's any way to get SAM to give convex polygons, however it would be fairly straightforward to do this using more conventional (i.e. not AI) image processing. OpenCV has built in functions that make this easy, the steps would be something like:

  1. Convert the SAM prediction to a binary mask (in numpy)
  2. Use cv2.findContours to get polygons from the mask
  3. Use cv2.convexHull to generate a convex hull from each polygon
  4. Use cv2.fillConvexPoly to draw convex hulls onto a blank image to produce the final mask

From the code snippet you posted, this would maybe look like:

import cv2 # Requires opencv to be installed!
import numpy as np

mask_uint8 = ((out_mask_logits[0] > 0.0).byte() * 255).cpu().numpy()
contours, _ = cv2.findContours(mask_uint8, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
final_mask_uint8 = np.zeros_like(mask_uint8)
for c in contours:
  hull = cv2.convexHull(c)
  cv2.fillConvexPoly(final_mask_uint8, hull, 255)

This is assuming out_mask_logits[0] is just a single-channel mask (i.e. has shape: HxW). If it has multiple channels (i.e. the multi-mask predictions) then you may need to process each mask separately, since the opencv functions probably won't handle the multi-channel mask properly.