Open hanjoonwon opened 7 months ago
It's generally tricky to get something like 'the most accurate mask' since that's subjective. Taken very literally, you could modify the amg code, right after line 221:
masks = generator.generate(image)
masks = [max(masks, key = lambda m: m["predicted_iou"])]
Which should give you a single mask with the highest IoU prediction (and therefore the one considered 'most accurate' by the model itself). However, this may not match with your idea of which mask is best/most accurate. If you're specifically looking for foreground elements, then you may prefer the largest mask, which you can get by changing the code to something like:
masks = generator.generate(image)
masks = [max(masks, key = lambda m: m["area"])]
Again though, this may not match up with your preferences.
For 52 images, I think it would only take a few minutes to manually generate good masks by giving box/point prompts (as opposed to using the automatic mask generator) using a UI (like this one? I haven't actually tried it, but it looks like it could work).
If it needs to be automated, then it might be better to try something like grounded SAM, which let's you provide a text prompt to specify what you want segmented. Or otherwise, for foreground stuff specifically, maybe using a depth-prediction model like MiDas or Zoe (and thresholding the depth map to only get elements 'close to camera') could work?
I want to use this script: python scripts/amg.py --checkpoint /home/joonwon/segment-anything/checkpoint/sam_vit_h_4b8939.pth --model-type vit_h I want to get foreground mask for my 52 images, but when I run the existing code on my ubuntu anaconda, it generates too many masks and it is hard to combine them into one. How can I get the most accurate one mask per image?