How to infer mask for multiple images?

facebookresearch / segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Apache License 2.0

46.96k stars 5.56k forks source link

How to infer mask for multiple images? #492

Open YuhsiHu opened 1 year ago

YuhsiHu commented 1 year ago

Thanks for your great work!

I would like to use the mask generator (see notebooks/automatic_mask_generator_example.ipynb) to get masks for multiple images. Therefore, the dimension will be 4 (batch, channel, height, width). It seems that only predictor can do that?

heyoeyo commented 1 year ago

I think the easiest option would just be to process each of them separately (in a loop for example), unless that's a problem for your use case?

Something like:

image_1 = cv2.imread('images/dog.jpg')
image_2 = cv2.imread('images/someotherpicture.jpg')
image_3 = cv2.imread('images/yetanotherimage.jpg')
...
images_list = [image_1, image_2, image_3, ... (however many images you have)]
results_list = [mask_generator.generate(image) for image in images_list]

The results would end up in a list which you'd have to index into to get the results for a specific input image (e.g. results_list[0] gives the dictionary of results for image_1 in this case).

YuhsiHu commented 1 year ago

Thank you for your reply. Yes, the most intuitive way is to process them one by one.

I opened this issue because there is an example of processing a batch in the predictor_example.ipynb, but not in the mask generator. So I was wondering if the team did not implement this function.

heyoeyo commented 1 year ago

As far as I can tell, the automatic mask generator does still use the same underlying (predictor) code as in the predictor example. For example, the .set_image(...) call is here, while the prompt/mask decoder call is here. Both of these calls should still support batched inputs (it seems like the grid of points used by the generator is handled as a batch internally?).

The reason for not having it be batched may be due to the outputs not being directly batch compatible (i.e. the output of the auto mask generator is a dictionary, instead of a tensor), as well as the potential cropping step of the mask generator, which is implemented as a loop.

That being said, with a bit of work, it should be possible to convert more parts of the mask generator to use batched inputs, if you wanted the extra speed up. Though it doesn't seem to support this as-is.

ByungKwanLee commented 11 months ago

https://github.com/ByungKwanLee/Full-Segment-Anything addresses the ciritical issues of SAM, which supports batch-input on the full-grid prompt (automatic mask generation) with post-processing: removing duplicated or small regions and holes, under flexible input image size