Open omrastogi opened 3 months ago
It may be that the box is defined backwards, as in the top-left/bottom-right coordinates are reversed...? That might explain why the mask looks reversed. It might also be worth checking the other masks (from multi-mask output), since it may just be that one of them is giving this odd looking result.
From what I've seen, the results from v2 are generally similar to v1, but a bit more prone to weird artifacts. However, the new models scale to larger image sizes using a lot less VRAM than the v1 models, so they can give cleaner/smoother outlines.
I am also seeing worse performance with points prediction
I am also seeing worse performance with points prediction
From what I've seen, between the different sized SAMv2 models, there can be significant differences in which masks (i.e. whole object, sub-components of object etc.) end up in the different indexes of the multi-mask output. For example, the 0-th index mask of the large model tends to pick the smallest sub-component around the point prompt, while the same 0-th mask of the base-plus model tends to pick the 'whole' object. So you might be able to get a better result by picking a different mask output.
SAM
vit_l
box input
True
SAM2
large
box input
True