GPU driver too old - Githubissues

jankrepl commented 1 year ago

Hi there,

I followed the tutorial in the README and tried to deploy on ml.g4dn.xlarge (to run GPU inference)

The cloud watch logs contain the following error

RuntimeError: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.

Thank you in advance for your help!

parano commented 1 year ago

Hi @jankrepl - have you specified the cuda version in your bentofile.yaml? https://docs.bentoml.com/en/latest/concepts/bento.html#gpu-support

jankrepl commented 1 year ago

My bad @parano

Modifying the bentofile.yaml worked. I was kind of hoping that there would be some under the hood magic that would look at the sagemaker instance I am selecting and in case it is a GPU one it would add CUDA related stuff to the image.

But again, the current behavior is better than what I suggested above.

Thank you for the quick reply

jankrepl commented 1 year ago

Just for future reference, torch>=2 does not support the CUDA 11.6.2 so I had to downgrade to torch<2. That was the cause of the original problem:)

bentoml / aws-sagemaker-deploy

GPU driver too old #50