facebookresearch / segment-anything-2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Apache License 2.0
10.76k stars 874 forks source link

Does sam2 have any parameters to adjust the inference result? #299

Open luoshuiyue opened 1 week ago

luoshuiyue commented 1 week ago

The following is the result I predicted, may I ask if there is any way to improve the result? I have adjusted mask_threshold to -1.0, -0.5,-0.2, and max_hole_area to 1, 20. None of these methods worked. f00c906888e8209d50460aecb5821a5 f0623af7b313738e312a73506e1fe6e 1725850035726 e7aa762c0712c6d29ea66784fe0335b

heyoeyo commented 1 week ago

One thing to try if you haven't already is using the different models (i.e. large vs. base), since they behave differently and one might work better than the other in some cases (i.e. large isn't always the best). It's also worth checking the different mask outputs (from multimask), since sometimes there can be one good mask even if the rest aren't great.

I'd also recommend trying to use as few prompts as possible. From what I've seen, the quality of the output really starts to drop once there are lots of prompts. In the worst case, where the masking isn't getting everything needed, you could try masking different pieces separately (using just 1 or 2 prompts) and combining the masks afterwards if that works for your use case (though it is inconvenient...).

And lastly, if you haven't already tried it, box prompts sometimes work well for objects that have lots of distinct areas like the person in the picture (i.e. legs + shorts + shirt + arms etc.). For example, one box prompt (using the large model) does fairly well on the last picture at least: box_example

luoshuiyue commented 6 days ago

Thanks. I change the base plus model and the results doesn't get better. I use bbox setting just by copying the code in jupyter notebook and putting it in for loop, and the improvement is very small. So, I want to ask:

  1. How to get the result just as you show in the GIF in your previous reply?
  2. How to use the result of automatic_mask_generator_example.ipynb, I want to get the mask of person in the middle: 0612d7efb3addd6e537fa05a359ac4c 97b331b5c11d45c150cf77a9a44e4fa
heyoeyo commented 6 days ago

How to get the result just as you show in the GIF in your previous reply?

That gif is a screencapture of using this script.

How to use the result of automatic_mask_generator_example.ipynb, I want to get the mask of person in the middle

I think it would be tricky to do with the auto mask generator alone. The default point grid covers the whole image and is going to pick up loads of stuff in the background that will make it hard to deal with, so you could try using a custom point_grid that is limited to the center of the image. You could also try adjusting the min_mask_region_area setting, to see if that can help to filter out 'small' masks.

If you don't mind bringing in other models, you could also try using an object (person) detector to at least get a bounding box around the person and use that to ignore all the masks outside. Or similarly, you could maybe use a depth prediction model to ignore any masks that come from parts of the image that are 'too far away' to be the person. Otherwise I think it's difficult to target specific objects with the auto mask generator, since the SAM models alone don't have a way to classify the segmentation results.