Long inference time for single prompt (2s for encoding)

shen-hhao commented 1 year ago

Here is my code, just adding the time check on the demo code. It seems the encoding process is far more time-consuming than expected.

from mobile_sam import sam_model_registry, SamAutomaticMaskGenerator, SamPredictor
import torch
import cv2
import numpy as np
import time

model_type = "vit_t"
sam_checkpoint = "./weights/mobile_sam.pt"

device = "cuda:1" if torch.cuda.is_available() else "cpu"
print(device)

mobile_sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
mobile_sam.to(device=device)
mobile_sam.eval()

predictor = SamPredictor(mobile_sam)

image = cv2.imread('./0000_color.png')
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
print(image.shape)
box = np.array([100,100,400,600])
time_s = time.time()
predictor.set_image(image=image)
time_e1 = time.time()
masks, _, _ = predictor.predict(box=box)
time_e2 = time.time()
print('encoding time:',time_e1-time_s)
print('decoding time:',time_e2-time_e1)

When using a GPU (RTX 3090), the output is: cuda:1 (480, 640, 3) encoding time: 2.325988531112671 decoding time: 0.018665313720703125

When using cpu, the output is: cpu (480, 640, 3) encoding time: 0.8602027893066406 decoding time: 0.08456754684448242

The cpu even runs faster on encoding

YounkHo commented 4 months ago

I met the same problem on a single RTX 3090, have you solved the problem?

shen-hhao commented 4 months ago

I met the same problem on a single RTX 3090, have you solved the problem?

I didn't try this code later because I found some alternatives. If you want to use MobileSAM, the implementation here https://github.com/IDEA-Research/Grounded-Segment-Anything/tree/main/EfficientSAM#run-grounded-mobilesam-demo is fast, and I tried on single RTX4090.

ChaoningZhang / MobileSAM

Long inference time for single prompt (2s for encoding) #61