Closed shallweiwei closed 1 year ago
I just found the inference time is even longer than SAM when SAM uses 'vit_b'. I add the time in around this line code
Since MobileSAM uses exactly the same decoder as the original SAM. Ideally, we only need to compare with the inference time on the image encoder. Note that DeiT has the same architecture as ViT. As shown in the tiny ViT paper, their Tiny-ViT is significantly faster than that of DeiT-B (with 68M paprameters) even though Tiny-ViT is not optimally designed for being fast but lightweight (small model size). I suggest you to use the code in https://github.com/microsoft/Cream/tree/main/TinyViT for a double-check. You are also suggested to contact the authors of Tiny-ViT for how they compare their inference time. I hope this helps you debug your issue.
I just found the inference time is even longer than SAM when SAM uses 'vit_b'. I add the time in around this line code