IDEA-Research / Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
https://arxiv.org/abs/2401.14159
Apache License 2.0
14.85k stars 1.37k forks source link

grounded_sam_simple_demo.py no CUDA accelerated #307

Open danigarciaoca opened 1 year ago

danigarciaoca commented 1 year ago

Hi everyone!

I was testing grounded_sam_simple_demo.py and got surprised about set_image computing time. After checking, it's missing a line to send the model to CUDA devide. After line 25 there should be something like this:

sam.to(device=DEVICE)

Hope it is useful!

rentainhe commented 1 year ago

Hi everyone!

I was testing grounded_sam_simple_demo.py and got surprised about set_image computing time. After checking, it's missing a line to send the model to CUDA devide. After line 25 there should be something like this:

sam.to(device=DEVICE)

Hope it is useful!

Thanks for fixing this!

HripsimeS commented 1 year ago

@dankresio @rentainhe Hello!

I changed sam.to(device=DEVICE) some days ago and I get a mask of one image within 2.5-2.7 seconds. But then I tried to set grounding_dino_model.to(device=DEVICE) to accelerate the prediction, but I got this error. AttributeError: 'Model' object has no attribute 'to'

The detection of objects with testing prompt using grounding_dino_model for one image takes now 10-11 seconds. Is there some way to accelerate the prediction and reduce the time for groundingdino model ?

danigarciaoca commented 1 year ago

Hi @HripsimeS,

Method "to" is only available when the class inherits from parent nn.Module such as Sam class here.

If you check grounding DINO initialization you would verify that:

  1. it doesn't inherit from nn.Module
  2. __init__ method includes device attribute and is default to "cuda" image

Hope it helps!

HripsimeS commented 1 year ago

@dankresio thank you very much for your reply! So it is already on GPU device settled. But is there some solution to make the inference of GroundingDino model faster, currently it takes 10-11 seconds/image with GPU type A100. Can you please advice me how I can reduce the inference time for groundingdino model. Thanks a lot in advance!

danigarciaoca commented 1 year ago

@HripsimeS check if the model is loading on GPU because this inference time with an A100 is huge. I have a GTX 1070 and G-DINO inference time is arount 1 seconds (excluding model loading, just inferencing on the image).

PS: check RAM vs GPU usage when inferencing with G-DINO and there you will get the answer

HripsimeS commented 1 year ago

@dankresio thanks for your advice. I used GPU of Google Colab and below you can find results for one image. I would love to reduce the GroundingDino inference time to 1-2 seconds, but don't know how to do on Google Colab.

image

HripsimeS commented 1 year ago

And this is for SAM model, much faster than GroundingDino model

image

danigarciaoca commented 1 year ago

Hi @HripsimeS,

I think it is an optimization issue regarding the wrapper class for G-DINO.

Try running grounded_sam_demo.py just for the G-DINO inference part. In my machine, it reduces inference time x3: image Inference G-DINO: 1.115 seconds

Run it and tell me if this improves execution time :smiley:

HripsimeS commented 1 year ago

@danigarciaoca thanks a lot for your advice, indeed using the grounded_sam_demo.py made a fantastic change for both GroundingDino and SAM models. You can see results below 🥇 👍

Runtime