Comparison of SAM and SAM2 checkpoints

idilsulo commented 2 months ago

Hello all! I have a question regarding the comparison of SAM vs SAM2. In the paper in Table 6, there is a comparison provided between both models across 37 datasets.

Does this comparison compare the SAM-H with SAM2 large checkpoint? It is interesting to me that that the biggest model checkpoint for SAM is ~2.4GB while the biggest one for SAM2 is much smaller and shows superior results.

What is the main reason behind this difference? Will there be another huge checkpoint released for SAM2?

Thanks in advance!

heyoeyo commented 2 months ago

The first SAM model is made out of an image encoder, prompt encoder & mask decoder. The new version adds some memory encoding/attention components as well. The mask + prompt components are only about 16MB (across all model sizes for both SAM v1 & v2). The new memory components are an additional 30MB (across all SAMv2 sizes). So the remaining model size comes from the image encoder. For example, the image encoder on the base SAM v1 model is about 360MB, whereas the base SAM v2 image encoder seems to be about 110MB.

So even though SAMv2 has extra model components, the smaller size is coming from a change to the image encoder. V2 uses a very different image encoder model called Hiera (the original SAM used a model based on this paper), which seems to be much smaller for the same/better image encoding performance.

idilsulo commented 1 month ago

Ah, this makes sense! Thanks a lot!

lucasjinreal commented 2 weeks ago

Hello, would like ask will there be a Huge version of SAM2 as well.

Why you didn't try train large base model as SAM1 does (huge) if there is not.

facebookresearch / sam2

Comparison of SAM and SAM2 checkpoints #92