Open Nastu-Ho opened 1 year ago
Thanks for using MobileSAM. We can answer your question.
MobileSAM was trained on auto generated annotations in Segment Anything 1 Billion (SA-1B) dataset. So that the mask area when you click a foreground point tends to be large. But, if you click multiple foreground/background points, the accuracy is almost the same as ViT-Base SAM.
The model size of MobileSAM is smaller so that the calculation speed is faster. We used the quantized ViT-Base SAM model as comparison.
28.1MB mobile_sam_preprocess.onnx 16.5MB mobile_sam.onnx
108.9MB sam_vit_b_01ec64_preprocess.onnx 8.8MB sam_vit_b_01ec64.onnx
When click a foreground point using MobileSAM.
When click a foreground point using ViT-Base SAM.
Will mobile SAM have better segmentation accuracy than SAM-Base?