Open YuhsiHu opened 1 year ago
I think the easiest option would just be to process each of them separately (in a loop for example), unless that's a problem for your use case?
Something like:
image_1 = cv2.imread('images/dog.jpg')
image_2 = cv2.imread('images/someotherpicture.jpg')
image_3 = cv2.imread('images/yetanotherimage.jpg')
...
images_list = [image_1, image_2, image_3, ... (however many images you have)]
results_list = [mask_generator.generate(image) for image in images_list]
The results would end up in a list which you'd have to index into to get the results for a specific input image (e.g. results_list[0]
gives the dictionary of results for image_1
in this case).
Thank you for your reply. Yes, the most intuitive way is to process them one by one.
I opened this issue because there is an example of processing a batch in the predictor_example.ipynb, but not in the mask generator. So I was wondering if the team did not implement this function.
As far as I can tell, the automatic mask generator does still use the same underlying (predictor) code as in the predictor example. For example, the .set_image(...) call is here, while the prompt/mask decoder call is here. Both of these calls should still support batched inputs (it seems like the grid of points used by the generator is handled as a batch internally?).
The reason for not having it be batched may be due to the outputs not being directly batch compatible (i.e. the output of the auto mask generator is a dictionary, instead of a tensor), as well as the potential cropping step of the mask generator, which is implemented as a loop.
That being said, with a bit of work, it should be possible to convert more parts of the mask generator to use batched inputs, if you wanted the extra speed up. Though it doesn't seem to support this as-is.
https://github.com/ByungKwanLee/Full-Segment-Anything addresses the ciritical issues of SAM, which supports batch-input on the full-grid prompt (automatic mask generation) with post-processing: removing duplicated or small regions and holes, under flexible input image size
Thanks for your great work!
I would like to use the mask generator (see notebooks/automatic_mask_generator_example.ipynb) to get masks for multiple images. Therefore, the dimension will be 4 (batch, channel, height, width). It seems that only predictor can do that?