Open adityapatadia opened 3 years ago
@adityapatadia Did you figure out a way to do upgrade driver versions? I'm having the same issue.
For anyone that finds themselves with this problem, you can untie your CUDA driver version from your COS version with the following steps:
gsutil ls gs://cos-tools/
for a newer cos version than the one in your cluster. gsutil ls gs://cos-tools/<newer COS version>/extensions/gpu
(note if the COS version is 16928.0.0 or newer, this folder does not appear to exist, keep the --version=latest
in the next step.daemonset-preloaded-latest.yaml
to command: ['/cos-gpu-installer', 'install', '--allow-unsigned-driver', '--version=<driver version found under extensions/gpu, just the #>', '--gcs-download-prefix=<newer COS version>'
. For me I had COS version 16108.604.3 which pinned CUDA 450.119.04, I was able to use the COS 16623.102.4 version's 470.82.01. CUDA driver.Other notes: It may be possible to directly specify the driver url with versions found under gs://nvidia-drivers-us-public/tesla/
. The command would be (for example): command: [ '/cos-gpu-installer', 'install', '--allow-unsigned-driver', '--nvidia-installer-url=https://storage.googleapis.com/nvidia-drivers-us-public/tesla/510.47.03/NVIDIA-Linux-x86_64-510.47.03.run' ]
. I tested this once but there is an issue with the precompiled COS toolchain that gets downloaded. It may be possible to fix this or this issue may not occur at all with a different COS version than I have. You also try specifying, --gcs-download-prefix
for a different COS toolchain version and see if that works. I did not get a chance to confirm as I timeboxed this driver upgrade to 2 hours.
Where can we find information on how to control driver and cuda version? This becomes really challenging, given no information for deploying gpus in gke(
We are using this guide to install drivers: https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#installing_drivers
Now, the drivers for COS are locked and it always installs
450.119.04
. We want to upgrade driver to version460.32.03
because https://github.com/FFmpeg/nv-codec-headers needs driver version 455.28 or newer.How can we upgrade driver version?