Inference is long even on GPU machine

developmentseed / segment-anything-services

Running segment-anything image embedding, prompting, and mask generation as torchserve services

Apache License 2.0

92 stars 10 forks source link

Inference is long even on GPU machine #36

Open RemyNtshaykolo opened 10 months ago

RemyNtshaykolo commented 10 months ago

Hello thank you for the repo it is very complete, I was able to launch a model on an p2.xlarge EC2 instance on AWS.

I'm having performance problems. I have the impression that the gpu is not being used because I get inference times similar to the inference times when I run a model on my mac which has no GPU. The encoding image command curl http://127.0.0.1:8080/predictions/sam_vit_h_encode -T slick_example.png takes around 2 minutes to run as you can see on the followings logs:

2023-11-08T08:31:23,510 [INFO ] W-9000-sam_vit_h_encode_1.0.0-stdout MODEL_LOG - XXXXX  Inference time:  114.42793655395508

I was expecting "ms" performance.

Also when investigating logs, I see pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:100.0|#Level:Host,DeviceId:0|#hostname:ac47803e69a1,timestamp:1699394016 so it looks like GPU is used.

RemyNtshaykolo commented 10 months ago

Problem

Pytorch is not abled to use gpu.

Cause

The pytorch/torchserve:latest-gpu ( 0.9.0-gpu at this date ) from the Dockerfile-gpu file required a cuda version greater than 12.0. The cuda toolkit 12.0 need a least a >=525.60.13 Linux x86_64 Driver Version. However, p2.xlarge AWS EC2 use tesla k80 and its latest compatible drivers are the 470.X .. (https://forums.developer.nvidia.com/t/in-what-step-is-nvidia-smi-supposed-to-be-installed/59701/13#:~:text=Drivers%20after%20R470%20(specifically%20R495%20and%20later%2C%20at%20least)%20do%20not%20support%20Kepler%20GPUs.)

Solution

Use pytorch/torchserve:0.8.2-gpu which required cuda>=11.8, given that cuda 11.8 is compatible with 470.X drivers

Torchserve docker file 0.9.0-gpu, cuda 12.0 requirements

Cuda toolkit driver compatibility

rbavery commented 9 months ago

Hi @RemyNtshaykolo correct you'll need to align the cuda driver with the gpu arch if you are using an older gpu. I tested this on a 1080 Ti and 3090.

rbavery commented 9 months ago

I'll add a comment on the readme that only GPUs that support the linux drivers >=525.60.13 Linux x86_64 are supported and point to this issue as an example for those that want to use older GPUs.

rbavery commented 9 months ago

Actually turns out the latest gpu docker image doesn't work on my 3090 either! Downgrading to 0.8.2-gpu fixes the issue on my 3090. Thanks for pointing this out @RemyNtshaykolo . I'll pin the torchserve versions in both containers.