Performance comparison of Mobile SAM and SAM-Base

Thanks for using MobileSAM. We can answer your question.

MobileSAM was trained on auto generated annotations in Segment Anything 1 Billion (SA-1B) dataset. So that the mask area when you click a foreground point tends to be large. But, if you click multiple foreground/background points, the accuracy is almost the same as ViT-Base SAM.

The model size of MobileSAM is smaller so that the calculation speed is faster. We used the quantized ViT-Base SAM model as comparison.

28.1MB mobile_sam_preprocess.onnx 16.5MB mobile_sam.onnx

108.9MB sam_vit_b_01ec64_preprocess.onnx 8.8MB sam_vit_b_01ec64.onnx

When click a foreground point using MobileSAM.

スクリーンショット 2023-10-24 4 23 06

When click a foreground point using ViT-Base SAM.

スクリーンショット 2023-10-24 4 23 25

ChaoningZhang / MobileSAM

Performance comparison of Mobile SAM and SAM-Base #108