How to get the cutouts of objects as shown in UI

facebookresearch / segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Apache License 2.0

46.74k stars 5.54k forks source link

How to get the cutouts of objects as shown in UI #530

Open chirayu-2001 opened 1 year ago

chirayu-2001 commented 1 year ago

In the demo, when we click on everything it creates mask for all the objects identified by mask, and then on clicking cutouts we get the cutout images of all the objects. I am working on one problem, where I already got masks by first using grounding dino and then segment-anything, now I want the cutouts for some of the masked objects. How to get them?

heyoeyo commented 1 year ago

The cutouts would just be a copy of the original image, where an alpha channel has been added and set to the (binary) segmentation mask. That way areas outside of the mask are completely transparent and areas inside the mask are 'opaque' (and therefore match the original color of the image). The demo cutouts also crop the resulting image down to just the piece that includes the segmentation mask, which is a nice addition.

As for how you do that, it really depends on how/where you're handling the data. The demo that's included in this project repo is a simplified version of the main demo and doesn't seem to include the same cutouts, but has something similar (where the cutout is all blue, instead of matching the original image color). The code for that seems to be in the maskUtils.tsx file, so that could be used if you're trying to make a javascript version like the demo (though instead of using the hardcoded [0, 114, 189, 255] rgba value, you'd want to copy the rgb component from the original image).

If you wanted to do this in python an get an image file instead, then it would depend on how the image data is being handled (PIL image vs opencv image vs tensor I guess?). But in all cases, you'd need to make a copy of the original image data, add an alpha channel (if it didn't already have one) then fill that alpha channel with the segmentation mask, scaled to match the size of the image. You'd probably also need to save that as a .png image (or other formats that support transparency, just not .jpg) in order to properly view it.

I don't know much about PIL and using tensors for all this would be a bit messy. In opencv, the code for doing all this would be something like:

import cv2
import numpy as np   # <- this is just to make a fake SAM mask

# Assuming image + mask already exist from earlier part of script
img_bgr = cv2.imread("path/to/original/image.jpg")
sam_mask_uint8 = np.random.randint(0, 255, (256, 256), dtype = np.uint8)
# ^^^ This part would come from your segmentation result

# Add alpha channel to original image
img_bgra = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2BGRA)

# Scale mask to match image sizing
img_height, img_width = img_bgra.shape[0:2]
resized_mask = cv2.resize(sam_mask_uint8, dsize = (img_width, img_height))

# Use the mask as the alpha channel of the image
img_bgra[:,:,3] = resized_mask

# Save png copy of masked image
cv2.imwrite("path/to/saved/result.png", img_bgra)

chirayu-2001 commented 1 year ago

Thank you for your solution. It would have helped me, but I want the cutouts to be of the same size of that of masked object.

heyoeyo commented 1 year ago

Getting the cutouts to match the size of the object can be done by cropping the image. Again, it depends on how you're handling the image data, but in opencv for example, you can use 'slicing':

cropped_img_bgra = img_bgra[y_min:y_max, x_min:x_max, :]

Where y_min, y_max, x_min and x_max come from the top-left and bottom-right corners of the bounding box surrounding the object in the image. You could probably use the bounding box from grounding DINO as a good approximation of where the object is.

Alternatively, if you want to crop as close as possible to the segmentation, you would need to determine it's bounding box after segmentation. Yet again, this depends on how you handle the image data, in opencv you can use the findContours and boundingRect functions, something like:

contours_list, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
xywh_list = [cv2.boundingRect(each_contour) for each_contour in contours_list]
y1_y2_x1_x2_list = [(y, y+h, x, x+w) for x,y,w,h in xywh_list]

SAM can (rarely) output more than one distinct object, so this code can output a list of bounding boxes. You'd probably want to pick the largest or maybe form a box that surrounds all other boxes (depends on how you want the cropping to work).

chirayu-2001 commented 1 year ago

But this will also give object's image in a rectangular form, I want the exact polygonal cutout of object.

heyoeyo commented 1 year ago

Do you have a specific format in mind? I'm not familiar with any image formats that support non-rectangular boundaries.

If you only need the shape and not the pixel data itself, then you can use the contours_list data (from the code above), which is the same as the polygon surrounding the object.

shrutichakraborty commented 10 months ago

The cutouts would just be a copy of the original image, where an alpha channel has been added and set to the (binary) segmentation mask. That way areas outside of the mask are completely transparent and areas inside the mask are 'opaque' (and therefore match the original color of the image). The demo cutouts also crop the resulting image down to just the piece that includes the segmentation mask, which is a nice addition.

As for how you do that, it really depends on how/where you're handling the data. The demo that's included in this project repo is a simplified version of the main demo and doesn't seem to include the same cutouts, but has something similar (where the cutout is all blue, instead of matching the original image color). The code for that seems to be in the maskUtils.tsx file, so that could be used if you're trying to make a javascript version like the demo (though instead of using the hardcoded [0, 114, 189, 255] rgba value, you'd want to copy the rgb component from the original image).

If you wanted to do this in python an get an image file instead, then it would depend on how the image data is being handled (PIL image vs opencv image vs tensor I guess?). But in all cases, you'd need to make a copy of the original image data, add an alpha channel (if it didn't already have one) then fill that alpha channel with the segmentation mask, scaled to match the size of the image. You'd probably also need to save that as a .png image (or other formats that support transparency, just not .jpg) in order to properly view it.

I don't know much about PIL and using tensors for all this would be a bit messy. In opencv, the code for doing all this would be something like:
import cv2
import numpy as np   # <- this is just to make a fake SAM mask

# Assuming image + mask already exist from earlier part of script
img_bgr = cv2.imread("path/to/original/image.jpg")
sam_mask_uint8 = np.random.randint(0, 255, (256, 256), dtype = np.uint8)
# ^^^ This part would come from your segmentation result

# Add alpha channel to original image
img_bgra = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2BGRA)

# Scale mask to match image sizing
img_height, img_width = img_bgra.shape[0:2]
resized_mask = cv2.resize(sam_mask_uint8, dsize = (img_width, img_height))

# Use the mask as the alpha channel of the image
img_bgra[:,:,3] = resized_mask

# Save png copy of masked image
cv2.imwrite("path/to/saved/result.png", img_bgra)

Hi! In this case what is the shape of your saved mask file? from the jupyter notebook examples it seems at the end we have a mask image that has a shape (x,y,4). and when I try your example, I get an error ValueError: could not broadcast input array from shape (368,553,4) into shape (368,553).

Additionally, I noticed that for an image of shhape (w,h) the mask shape becomes (h,w). When I try to resize it, I get an error :

`cv2.error: OpenCV(4.8.1) :-1: error: (-5:Bad argument) in function 'resize'

Overload resolution failed:

src data type = 0 is not supported

Expected Ptr for argument 'src'`

I'd like to apply the obtained mask to the original image (black in all areas except the object of interest) so that I can further process it. Any help would be greatly appreciated!

heyoeyo commented 10 months ago

what is the shape of your saved mask file?

In the example, the mask is just being generated randomly and has a shape of (256,256), with no channels (or you could say it has 1 channel, but there isn't a 3rd dimension for the channel data).

ValueError: could not broadcast input array from shape (368,553,4) into shape (368,553).

I'm guessing this occurs on the line: img_bgra[:,:,3] = resized_mask? This would happen because img_bgra[:,:,3] needs a mask with shape: HxW (or 368x553 in this case), but it's getting something with 4 channels on it (368x553x4). The solution would depend on why there are 4 channels in the mask data...

If the mask was originally saved as .png file with an alpha channel, then that would explain it having 4 channels. In this case, you'd need to convert it to be a single channel image before using it as the alpha channel of the image. You could replace the original line with something like:

img_bgra[:,:,3] = cv2.cvtColor(resized_mask, cv2.COLOR_BGRA2GRAY)

Alternatively, (and more likely?) it could just be that there are 4 separate masks, and that explains the 4 channels. In that case, you'd need to choose which mask you want to use as the alpha channel for the image with something like:

mask_index = 0
img_bgra[:,:,3] = resized_mask[:,:,mask_index]

If you want to save every mask, you can do it in a loop:

for mask_index in range(resized_mask.shape[-1]):
    img_bgra[:,:,3] = resized_mask[:,:,mask_index]
    cv2.imwrite("path/to/saved/result_{}.png".format(mask_index), img_bgra)

cv2.error in function 'resize', src data type = 0 is not supported

This seems to be due to using a data type that isn't supported by the resize function in opencv (i.e. src data type = 0 is not supported). I'm not sure what data type your mask is in ('data type = 0'), but the rest of the code is assuming a numpy uint8 (unsigned 8-bit integer) array, with values between 0 (outside the mask) and 255 (inside the mask). So if you can convert your masks to that type first, the resizing should work.

I'd like to apply the obtained mask to the original image (black in all areas except the object of interest)

The original example code was for using the mask as an alpha channel to make the non-segmented part of the image transparent. If you want the non-segmented parts to be black instead, you can instead use something like:

mask_1ch = ... # Convert mask data to have shape: HxW, without a channel dimension
mask_3ch = cv2.cvtColor(mask_1ch, cv2.COLOR_GRAY2BGR)
blacked_out_image = cv2.bitwise_and(image_bgr, mask_3ch)

This still requires converting the mask to have only 1 channel (i.e. mask_1ch) and needs to be in uint8 format, with values of 0 (outside the mask) and 255 (inside the mask) to work properly.

momarms commented 8 months ago

Hi,

On the same topic, I wanted to save cutout just like the one in the online demo.

I thought of cropping that part of the image out and then running segmentation but the problem is I get a lot of masks which I don't want to have.

In the online demo if I use a boundary box, it only gives me a single mask (red in image) which is exactly what I need. I assume I can get rid of the outer mask most mask in my image if I also use a boundary box instead of cropping but how can I tune my model to get the same results as the demo? Can anyone help me with the parameters I need to change to get the same results as the demo?

The other way I can think of is that I get masks using a boundary box and then sort them by area and then only keep the largest mask?

Segmented_Chart

This is basically what I want to save as an end result.

Chart

heyoeyo commented 8 months ago

If you're using the amg.py script, then you might want to try using the min_mask_region_area flag and setting a large value. The idea being that if the minimum mask area is set high enough, you can avoid generating masks for all of the small details (e.g letters/symbols). That still might give issues with the larger light-gray outline, but you might be able to filter that out by not taking the largest mask, or masks above a certain area (though this is something that would need to be added manually, since it doesn't seem to be supported in the existing code).

Alternatively, if this doesn't need to be completely automated (i.e. if you're willing to crop the images before), then you can maybe just use the SAM demo site? If you haven't tried it, after you box-select something, you can click the Cut-out object button on the left menu to get a listing of selected images. If you right-click an object and choose Save image as... it'll save the cropped/cut-out version of the image.

If you want to run everything locally, there seems to be user-made UIs that include support for the box-select interface (which seems like the best way to segment your image), like this one, or this one. That might be the easiest (local) way to get similar segmentation results to the demo site.

momarms commented 8 months ago

The first image was from automatic mask generator. The second one was using the online demo like you suggested which I need to automate in some form and run locally, preferably without any mouse input.

I'll played out a little with points_per_side and min_mask_region_area but couldn't get exactly what I wanted. I'll try tweaking the parameters more. I also looked up another thread but I don't suppose we have something like max_mask_region_area. Any other ideas are highly appreciated :)

Thanks!

heyoeyo commented 8 months ago

For filtering out large masks, you can do something like:

masks = mask_generator.generate(image) # ... result from auto-mask generator

max_mask_area = 10000
small_masks = list(filter(lambda m: m["area"] < max_mask_area, masks))

It's not quite the same as how the built-in min-area filtering works (since it's happening at the end, instead of before all the filtering/merging steps), but maybe good enough?

If the images are very similar, it might be possible to run the predictor only (i.e. not the auto mask generator) using some hard-coded bounding box that roughly fits around where the 'target' is expect to be. Or it may even be possible to use simpler 'traditional' (i.e. non-AI) computer vision techniques if the images are all similar to the one you posted (thresholding alone might come very close to getting good masks here).

JayJayZi commented 7 months ago

You can use the segmentaiton output array which holds true or false. Just simply, find false values and set the corresponding pixels of the original image to a black color for instance. Save the output as .png.

height, width, channels = orgImage.shape cutoutImage = orgImage for y in range(height): for x in range(width): if masks[0]['segmentation'][y][x] == False: cutoutImage[y, x] = (0, 0, 0) cv2.imwrite("result.png", cutoutImage)