SAM2 degraded results compared to SAM

facebookresearch / sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Apache License 2.0

12.43k stars 1.15k forks source link

SAM2 degraded results compared to SAM #93

Open omrastogi opened 3 months ago

omrastogi commented 3 months ago

SAM

Version: vit_l
Input type: box input
Multimask Output: True

viz_7_sam viz_13_sam

SAM2

Version: large
Input type: box input
Multimask Output: True

viz_7_sam2 viz_13_sam2

heyoeyo commented 3 months ago

It may be that the box is defined backwards, as in the top-left/bottom-right coordinates are reversed...? That might explain why the mask looks reversed. It might also be worth checking the other masks (from multi-mask output), since it may just be that one of them is giving this odd looking result.

From what I've seen, the results from v2 are generally similar to v1, but a bit more prone to weird artifacts. However, the new models scale to larger image sizes using a lot less VRAM than the v1 models, so they can give cleaner/smoother outlines.

WaterKnight1998 commented 3 months ago

I am also seeing worse performance with points prediction

heyoeyo commented 3 months ago

I am also seeing worse performance with points prediction

From what I've seen, between the different sized SAMv2 models, there can be significant differences in which masks (i.e. whole object, sub-components of object etc.) end up in the different indexes of the multi-mask output. For example, the 0-th index mask of the large model tends to pick the smallest sub-component around the point prompt, while the same 0-th mask of the base-plus model tends to pick the 'whole' object. So you might be able to get a better result by picking a different mask output.