anuragxel / salt

Segment Anything Labelling Tool
MIT License
1.01k stars 126 forks source link

Suggestion - Integrate MobileSAM into the pipeline for lightweight and faster inference #47

Open mdimtiazh opened 1 year ago

mdimtiazh commented 1 year ago

Reference: https://github.com/ChaoningZhang/MobileSAM

Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.

MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:

image

image

Best Wishes,

Qiao

ryouchinsa commented 10 months ago

Thanks for your great tool, salt. We can add a comment through the experience implementing MobileSAM to our image annotation tool RectLabel.

MobileSAM was trained on auto generated annotations in Segment Anything 1 Billion (SA-1B) dataset. So that the mask area when you click a foreground point tends to be large. But, if you click multiple foreground/background points, the accuracy is almost the same as ViT-Base SAM.

The model size of MobileSAM is smaller so that the calculation speed is faster. We used the quantized ViT-Base SAM model as comparison.

28.1MB mobile_sam_preprocess.onnx 16.5MB mobile_sam.onnx

108.9MB sam_vit_b_01ec64_preprocess.onnx 8.8MB sam_vit_b_01ec64.onnx

When click a foreground point using MobileSAM.

スクリーンショット 2023-10-24 4 23 06

When click a foreground point using ViT-Base SAM.

スクリーンショット 2023-10-24 4 23 25

Please let us know your opinion.