Open RemyNtshaykolo opened 10 months ago
Pytorch is not abled to use gpu.
The pytorch/torchserve:latest-gpu ( 0.9.0-gpu at this date ) from the Dockerfile-gpu file required a cuda version greater than 12.0. The cuda toolkit 12.0 need a least a >=525.60.13 Linux x86_64 Driver Version. However, p2.xlarge AWS EC2 use tesla k80 and its latest compatible drivers are the 470.X .. (https://forums.developer.nvidia.com/t/in-what-step-is-nvidia-smi-supposed-to-be-installed/59701/13#:~:text=Drivers%20after%20R470%20(specifically%20R495%20and%20later%2C%20at%20least)%20do%20not%20support%20Kepler%20GPUs.)
Use pytorch/torchserve:0.8.2-gpu which required cuda>=11.8, given that cuda 11.8 is compatible with 470.X drivers
Hi @RemyNtshaykolo correct you'll need to align the cuda driver with the gpu arch if you are using an older gpu. I tested this on a 1080 Ti and 3090.
I'll add a comment on the readme that only GPUs that support the linux drivers >=525.60.13 Linux x86_64 are supported and point to this issue as an example for those that want to use older GPUs.
Actually turns out the latest gpu docker image doesn't work on my 3090 either! Downgrading to 0.8.2-gpu fixes the issue on my 3090. Thanks for pointing this out @RemyNtshaykolo . I'll pin the torchserve versions in both containers.
Hello thank you for the repo it is very complete, I was able to launch a model on an p2.xlarge EC2 instance on AWS.
I'm having performance problems. I have the impression that the gpu is not being used because I get inference times similar to the inference times when I run a model on my mac which has no GPU. The encoding image command
curl http://127.0.0.1:8080/predictions/sam_vit_h_encode -T slick_example.png
takes around 2 minutes to run as you can see on the followings logs:I was expecting "ms" performance.
Also when investigating logs, I see
pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:100.0|#Level:Host,DeviceId:0|#hostname:ac47803e69a1,timestamp:1699394016
so it looks like GPU is used.