Open mdimtiazh opened 1 year ago
hi, nice work and thanks for the suggestion. Can you provide the mask AP results on coco by comparing SAM with the same box prompt input?
SAM-HQ is an awesome work. I'm planning to do that, to make SAM-HQ more lightweight. And another idea is to add a language model to increase SAM-HQ's visual scene reasoning capabilities. To my surprise, the demo on Hugging Face already supports open-vocabulary segmentation by text prompt and performs very well. I'd like to know how and look forward to the opportunity to collaborate.
Reference: https://github.com/ChaoningZhang/MobileSAM
Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.
MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:
Best Wishes,
Qiao