ChaoningZhang / MobileSAM

This is the official code for MobileSAM project that makes SAM lightweight for mobile applications and beyond!
Apache License 2.0
4.68k stars 481 forks source link

It took 3000ms in my conputer #24

Closed chenzx2 closed 1 year ago

chenzx2 commented 1 year ago

It took 3000ms in my conputer,I don't know what is wrong

garbe-github-support commented 1 year ago

Me too,It's even slower than Sam。 MyCode

def show_anns(anns):
    if len(anns) == 0:
        return
    sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True)
    ax = plt.gca()
    ax.set_autoscale_on(False)

    img = np.ones((sorted_anns[0]['segmentation'].shape[0], sorted_anns[0]['segmentation'].shape[1], 4))
    img[:, :, 3] = 0  

    for ann in sorted_anns:
        m = ann['segmentation'] 
        color_mask = np.concatenate([np.random.random(3), [1]])  
        img[m] = color_mask  

    ax.imshow(img)

def runSam(path):
    sam = sam_model_registry["vit_h"](checkpoint=r"E:\model_dataset\sam_vit_h_4b8939.pth")
    device = "cuda"
    sam.to(device)
    mask_generator = SamAutomaticMaskGenerator(sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    masks = mask_generator.generate(img)

    return masks, img

def runMobileSam(path):
    from mobile_encoder.setup_mobile_sam import setup_model
    checkpoint = torch.load(r'D:\tools\MobileSAM\weights\mobile_sam.pt')
    mobile_sam = setup_model()
    mobile_sam.load_state_dict(checkpoint, strict=True)

    from segment_anything import SamAutomaticMaskGenerator

    mask_generator = SamAutomaticMaskGenerator(mobile_sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    masks = mask_generator.generate(img)
    return masks, img

def showRet(masks, img): 
    print(len(masks)) 
    print(masks[0].keys())

    plt.figure(figsize=(20, 20))
    plt.imshow(img)
    show_anns(masks)
    plt.axis('off')
    plt.show()

if __name__ == '__main__':
    path = r'C:\Users\Admin\Desktop\test_img\2033CD8A29F6C011006F8452C53A4D89.jpg'
    masks, img = runSam(path)
    # masks, img = runMobileSam(path)
    showRet(masks, img)

my environment windows , pytorch 2.0.1, cuda 11.7, 4070

newcoder0531 commented 1 year ago

Me too,It's even slower than Sam。 MyCode

def show_anns(anns):
    if len(anns) == 0:
        return
    sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True)
    ax = plt.gca()
    ax.set_autoscale_on(False)

    img = np.ones((sorted_anns[0]['segmentation'].shape[0], sorted_anns[0]['segmentation'].shape[1], 4))
    img[:, :, 3] = 0  

    for ann in sorted_anns:
        m = ann['segmentation'] 
        color_mask = np.concatenate([np.random.random(3), [1]])  
        img[m] = color_mask  

    ax.imshow(img)

def runSam(path):
    sam = sam_model_registry["vit_h"](checkpoint=r"E:\model_dataset\sam_vit_h_4b8939.pth")
    device = "cuda"
    sam.to(device)
    mask_generator = SamAutomaticMaskGenerator(sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    masks = mask_generator.generate(img)

    return masks, img

def runMobileSam(path):
    from mobile_encoder.setup_mobile_sam import setup_model
    checkpoint = torch.load(r'D:\tools\MobileSAM\weights\mobile_sam.pt')
    mobile_sam = setup_model()
    mobile_sam.load_state_dict(checkpoint, strict=True)

    from segment_anything import SamAutomaticMaskGenerator

    mask_generator = SamAutomaticMaskGenerator(mobile_sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    masks = mask_generator.generate(img)
    return masks, img

def showRet(masks, img): 
    print(len(masks)) 
    print(masks[0].keys())

    plt.figure(figsize=(20, 20))
    plt.imshow(img)
    show_anns(masks)
    plt.axis('off')
    plt.show()

if __name__ == '__main__':
    path = r'C:\Users\Admin\Desktop\test_img\2033CD8A29F6C011006F8452C53A4D89.jpg'
    masks, img = runSam(path)
    # masks, img = runMobileSam(path)
    showRet(masks, img)

my environment windows , pytorch 2.0.1, cuda 11.7, 4070

It seems that your mobilesam does not use cuda, but sam used。

ChaoningZhang commented 1 year ago

It took 3000ms in my conputer,I don't know what is wrong

Without more details, it is difficult for us to help you debug

garbe-github-support commented 1 year ago

Me too,It's even slower than Sam。 MyCode

def show_anns(anns):
    if len(anns) == 0:
        return
    sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True)
    ax = plt.gca()
    ax.set_autoscale_on(False)

    img = np.ones((sorted_anns[0]['segmentation'].shape[0], sorted_anns[0]['segmentation'].shape[1], 4))
    img[:, :, 3] = 0  

    for ann in sorted_anns:
        m = ann['segmentation'] 
        color_mask = np.concatenate([np.random.random(3), [1]])  
        img[m] = color_mask  

    ax.imshow(img)

def runSam(path):
    sam = sam_model_registry["vit_h"](checkpoint=r"E:\model_dataset\sam_vit_h_4b8939.pth")
    device = "cuda"
    sam.to(device)
    mask_generator = SamAutomaticMaskGenerator(sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    masks = mask_generator.generate(img)

    return masks, img

def runMobileSam(path):
    from mobile_encoder.setup_mobile_sam import setup_model
    checkpoint = torch.load(r'D:\tools\MobileSAM\weights\mobile_sam.pt')
    mobile_sam = setup_model()
    mobile_sam.load_state_dict(checkpoint, strict=True)

    from segment_anything import SamAutomaticMaskGenerator

    mask_generator = SamAutomaticMaskGenerator(mobile_sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    masks = mask_generator.generate(img)
    return masks, img

def showRet(masks, img): 
    print(len(masks)) 
    print(masks[0].keys())

    plt.figure(figsize=(20, 20))
    plt.imshow(img)
    show_anns(masks)
    plt.axis('off')
    plt.show()

if __name__ == '__main__':
    path = r'C:\Users\Admin\Desktop\test_img\2033CD8A29F6C011006F8452C53A4D89.jpg'
    masks, img = runSam(path)
    # masks, img = runMobileSam(path)
    showRet(masks, img)

my environment windows , pytorch 2.0.1, cuda 11.7, 4070

It seems that your mobilesam does not use cuda, but sam used。

Thank you, you are right. Now my time is half that of Sam

chenzx2 commented 1 year ago

It took 3000ms in my conputer,I don't know what is wrong

Without more details, it is difficult for us to help you debug

here is my code ,my environment: ubantu18 torch=2.0.0cu117 企业微信截图_16880289877752

ChaoningZhang commented 1 year ago

Me too,It's even slower than Sam。 MyCode

def show_anns(anns):
    if len(anns) == 0:
        return
    sorted_anns = sorted(anns, key=(lambda x: x['area']), reverse=True)
    ax = plt.gca()
    ax.set_autoscale_on(False)

    img = np.ones((sorted_anns[0]['segmentation'].shape[0], sorted_anns[0]['segmentation'].shape[1], 4))
    img[:, :, 3] = 0  

    for ann in sorted_anns:
        m = ann['segmentation'] 
        color_mask = np.concatenate([np.random.random(3), [1]])  
        img[m] = color_mask  

    ax.imshow(img)

def runSam(path):
    sam = sam_model_registry["vit_h"](checkpoint=r"E:\model_dataset\sam_vit_h_4b8939.pth")
    device = "cuda"
    sam.to(device)
    mask_generator = SamAutomaticMaskGenerator(sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    masks = mask_generator.generate(img)

    return masks, img

def runMobileSam(path):
    from mobile_encoder.setup_mobile_sam import setup_model
    checkpoint = torch.load(r'D:\tools\MobileSAM\weights\mobile_sam.pt')
    mobile_sam = setup_model()
    mobile_sam.load_state_dict(checkpoint, strict=True)

    from segment_anything import SamAutomaticMaskGenerator

    mask_generator = SamAutomaticMaskGenerator(mobile_sam)
    img = cv.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    masks = mask_generator.generate(img)
    return masks, img

def showRet(masks, img): 
    print(len(masks)) 
    print(masks[0].keys())

    plt.figure(figsize=(20, 20))
    plt.imshow(img)
    show_anns(masks)
    plt.axis('off')
    plt.show()

if __name__ == '__main__':
    path = r'C:\Users\Admin\Desktop\test_img\2033CD8A29F6C011006F8452C53A4D89.jpg'
    masks, img = runSam(path)
    # masks, img = runMobileSam(path)
    showRet(masks, img)

my environment windows , pytorch 2.0.1, cuda 11.7, 4070

It seems that your mobilesam does not use cuda, but sam used。

Thank you, you are right. Now my time is half that of Sam

It seems that your issues are addressed. Thanks for your interest in our work.

ChaoningZhang commented 1 year ago

It took 3000ms in my conputer,I don't know what is wrong

Without more details, it is difficult for us to help you debug

here is my code ,my environment: ubantu18 torch=2.0.0cu117 企业微信截图_16880289877752

May I ask you whether you choose anything mode or everything mode?

SongYii commented 1 year ago

Even after adding the following in the code device = "cuda" mobile_sam.to(device=device)

MobileSAM takes half the time of SAM, which is quite different from the speed claimed in the paper, and much slower than FastSAM. I don't know what the problem is.

SAM 用时: 2.2856764793395996 秒 150 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box']) LR SCALES: [0.08589934592000005, 0.10737418240000006, 0.13421772800000006, 0.1677721600000001, 0.20971520000000007, 0.2621440000000001, 0.3276800000000001, 0.4096000000000001, 0.5120000000000001, 0.6400000000000001, 0.8, 1.0] MobileSAM 用时: 1.4033191204071045 秒 97 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box'])

chenzx2 commented 1 year ago

fastSAM is fine,I run the code of notebook

ChaoningZhang commented 1 year ago

Even after adding the following in the code device = "cuda" mobile_sam.to(device=device)

MobileSAM takes half the time of SAM, which is quite different from the speed claimed in the paper, and much slower than FastSAM. I don't know what the problem is.

SAM 用时: 2.2856764793395996 秒 150 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box']) LR SCALES: [0.08589934592000005, 0.10737418240000006, 0.13421772800000006, 0.1677721600000001, 0.20971520000000007, 0.2621440000000001, 0.3276800000000001, 0.4096000000000001, 0.5120000000000001, 0.6400000000000001, 0.8, 1.0] MobileSAM 用时: 1.4033191204071045 秒 97 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box'])

May I ask you whether you choose anything mode or everything mode?

SongYii commented 1 year ago

Even after adding the following in the code device = "cuda" mobile_sam.to(device=device) MobileSAM takes half the time of SAM, which is quite different from the speed claimed in the paper, and much slower than FastSAM. I don't know what the problem is. SAM 用时: 2.2856764793395996 秒 150 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box']) LR SCALES: [0.08589934592000005, 0.10737418240000006, 0.13421772800000006, 0.1677721600000001, 0.20971520000000007, 0.2621440000000001, 0.3276800000000001, 0.4096000000000001, 0.5120000000000001, 0.6400000000000001, 0.8, 1.0] MobileSAM 用时: 1.4033191204071045 秒 97 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box'])

May I ask you whether you choose anything mode or everything mode?

I re-read the paper and the code, and I'm running in 'segment everything' mode. FastSAM took 0.0546329 seconds, MobileSAM took 1.4033191 seconds.

ChaoningZhang commented 1 year ago

Even after adding the following in the code device = "cuda" mobile_sam.to(device=device) MobileSAM takes half the time of SAM, which is quite different from the speed claimed in the paper, and much slower than FastSAM. I don't know what the problem is. SAM 用时: 2.2856764793395996 秒 150 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box']) LR SCALES: [0.08589934592000005, 0.10737418240000006, 0.13421772800000006, 0.1677721600000001, 0.20971520000000007, 0.2621440000000001, 0.3276800000000001, 0.4096000000000001, 0.5120000000000001, 0.6400000000000001, 0.8, 1.0] MobileSAM 用时: 1.4033191204071045 秒 97 dict_keys(['segmentation', 'area', 'bbox', 'predicted_iou', 'point_coords', 'stability_score', 'crop_box'])

May I ask you whether you choose anything mode or everything mode?

I re-read the paper and the code, and I'm running in 'segment everything' mode. FastSAM took 0.0546329 seconds, MobileSAM took 1.4033191 seconds.

Thanks for your interest in our work. Note that MobileSAM makes the image encoder lightweight without changing the decoder (like 8ms on the encoder and 4ms on the decoder). Since we mainly target the anything mode (1 times image encoder and 1 times decoder) instead of everything mode (1 times image encoder and 32x32 times decoder), see the paper for definition difference (Anything mode is the foundation task while everything mode is just a downstream task as indicated in the original SAM paper). For everything mode, even though our encoder is much faster than that of the original SAM(close to 500ms), it cannot save too much time for the whole pipeline since most of the time is spent on the 32x32 times decoder. One way to mitigate this is to use smaller number of grids (like 10x10 or 5x5) to make the decoder consume less time, since many redundant masks are generated in the case of 32x32 grids. I hope this addresses your issues, otherwise, please kindly let us know. We are also currently trying to make the image decoder more lightweight by distilling it with smaller one as we did for image encoder. Stayed tuned for our progress. If you have more issues, please kindly let us know and we might not be able to respond in a timely manner, but will try our best.

fujianhai commented 1 year ago

This job is really great, the inference time for a point is about 10ms++, but the time for a full image is not much faster. Our GPU for the full image does take about 2s~3s . After all, the decoder network has not changed, and the entire image cannot be significantly improved.

ChaoningZhang commented 1 year ago

This job is really great, the inference time for a point is about 10ms++, but the time for a full image is not much faster. Our GPU for the full image does take about 2s~3s . After all, the decoder network has not changed, and the entire image cannot be significantly improved.

Thanks for your interest in our work. Please check our replies to others on how to mitigate this issue. Yet another way to speed it up on GPU is to do a batch inference for the decoder with 32*32 grids of prompt points. You can try implementing it and help do a pull request here, if you complete it. We will also implement it by ourselves but it take a while~~