Very slow inference on 3080

jbrownkramer commented 1 week ago

I would expect the 3080 to be, say, 30 to 50% slower than a V100. But it takes 45 seconds to run inference. The GPU utilization is very low (10% or so). Any idea how this could be fixed?

jbrownkramer commented 1 week ago

I did model = model.cuda() and image = image.cuda() and I got higher GPU usage and more like 5 to 8 seconds of inference time.

Ran it again (without reloading the model or the image). Massive RAM usage and much longer inference time.

jbrownkramer commented 1 week ago

Setting precision=torch.float16 seems to fix all of this. The initial inference takes about 1.5s, and subsequent inferences take about .6s.

You could consider updating the model loading part in the python code snippet to

# Load model and preprocessing transform
model, transform = depth_pro.create_model_and_transforms(device=torch.device("cuda"),precision=torch.float16)
model.eval()

Itachi-6 commented 1 week ago

@jbrownkramer I also have the same problem of larger inference time for a single image. It takes around 30-40 sec to process 1 image. I used the input you gave but it is giving me OutOfMemory Error. Could you please help me on what to do now?

I have Nvidia GTX-1650 GPU.

jbrownkramer commented 6 days ago

I don't have a GTX-1650 at my disposal. You could try setting the precision even lower.

csyhping commented 1 day ago

Hi @jbrownkramer , I tried your suggestion on a 3090.

If set float16, it takes ~17s; if default, it tasks ~18s.

And it seems they already set it to Half? #17

Do you have any idea about this? Thanks!

jbrownkramer commented 17 hours ago

Hi @jbrownkramer , I tried your suggestion on a 3090.

If set float16, it takes ~17s; if default, it tasks ~18s.

And it seems they already set it to Half? #17

Do you have any idea about this? Thanks!

@csyhping in issue #17 the user is running through run.py, which calls create_model_and_transforms on gpu if you have it, and half precision. However if you look at the definition of create_model_and_transforms

def create_model_and_transforms(
    config: DepthProConfig = DEFAULT_MONODEPTH_CONFIG_DICT,
    device: torch.device = torch.device("cpu"),
    precision: torch.dtype = torch.float32,
) -> Tuple[DepthPro, Compose]:

You see that by default it is float32 and running on cpu. I guess make sure you're doing

model, transform = depth_pro.create_model_and_transforms(device=torch.device("cuda"),precision=torch.float16)
model.eval()

And verify that your GPU is being well utilized when you run inference.

csyhping commented 9 hours ago

Hi @jbrownkramer , I tried your suggestion on a 3090. If set float16, it takes ~17s; if default, it tasks ~18s. And it seems they already set it to Half? #17 Do you have any idea about this? Thanks!

@csyhping in issue #17 the user is running through run.py, which calls create_model_and_transforms on gpu if you have it, and half precision. However if you look at the definition of create_model_and_transforms
def create_model_and_transforms(
    config: DepthProConfig = DEFAULT_MONODEPTH_CONFIG_DICT,
    device: torch.device = torch.device("cpu"),
    precision: torch.dtype = torch.float32,
) -> Tuple[DepthPro, Compose]:
You see that by default it is float32 and running on cpu. I guess make sure you're doing
model, transform = depth_pro.create_model_and_transforms(device=torch.device("cuda"),precision=torch.float16)
model.eval()
And verify that your GPU is being well utilized when you run inference.

@jbrownkramer , thanks for your reply. I rechecked my code and it worked.

if default, it takes ~1.5s if torch.float16, it takes ~0.15s

BTW: Load model takes ~16s

Thanks for your help.

apple / ml-depth-pro

Very slow inference on 3080 #64