ChaoningZhang / MobileSAM

This is the official code for MobileSAM project that makes SAM lightweight for mobile applications and beyond!
Apache License 2.0
4.68k stars 481 forks source link

Long inference time for single prompt (2s for encoding) #61

Open shen-hhao opened 1 year ago

shen-hhao commented 1 year ago

Here is my code, just adding the time check on the demo code. It seems the encoding process is far more time-consuming than expected.

from mobile_sam import sam_model_registry, SamAutomaticMaskGenerator, SamPredictor
import torch
import cv2
import numpy as np
import time

model_type = "vit_t"
sam_checkpoint = "./weights/mobile_sam.pt"

device = "cuda:1" if torch.cuda.is_available() else "cpu"
print(device)

mobile_sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
mobile_sam.to(device=device)
mobile_sam.eval()

predictor = SamPredictor(mobile_sam)

image = cv2.imread('./0000_color.png')
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
print(image.shape)
box = np.array([100,100,400,600])
time_s = time.time()
predictor.set_image(image=image)
time_e1 = time.time()
masks, _, _ = predictor.predict(box=box)
time_e2 = time.time()
print('encoding time:',time_e1-time_s)
print('decoding time:',time_e2-time_e1)

When using a GPU (RTX 3090), the output is: cuda:1 (480, 640, 3) encoding time: 2.325988531112671 decoding time: 0.018665313720703125

When using cpu, the output is: cpu (480, 640, 3) encoding time: 0.8602027893066406 decoding time: 0.08456754684448242

The cpu even runs faster on encoding

YounkHo commented 4 months ago

I met the same problem on a single RTX 3090, have you solved the problem?

shen-hhao commented 4 months ago

I met the same problem on a single RTX 3090, have you solved the problem?

I didn't try this code later because I found some alternatives. If you want to use MobileSAM, the implementation here https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/main/EfficientSAM#run-grounded-mobilesam-demo is fast, and I tried on single RTX4090.