NVIDIA / open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source
Other
15.26k stars 1.29k forks source link

Failed to install on Azure VM. #188

Open xkszltl opened 2 years ago

xkszltl commented 2 years ago

NVIDIA Driver Version

515.43.04

GPU

T4, but on Azure. AFAIK their hypervisor doesn't support GSP.

Describe the bug

We're having trouble installing 515 driver via DKMS. 510 was fine.

2022-05-14T00:00:57.8274090Z Setting up nvidia-settings (515.43.04-1) ...
2022-05-14T00:00:57.8292897Z Setting up nvidia-smi (515.43.04-1) ...
2022-05-14T00:00:57.8310522Z Setting up libgles-nvidia2:amd64 (515.43.04-1) ...
2022-05-14T00:00:57.8333871Z Setting up nvidia-driver-bin (515.43.04-1) ...
2022-05-14T00:00:57.8363352Z Setting up libnvcuvid1:amd64 (515.43.04-1) ...
2022-05-14T00:00:57.8381486Z Setting up nvidia-persistenced (515.43.04-1) ...
2022-05-14T00:00:57.9140961Z Warning: The home dir /var/run/nvpd/ you specified can't be accessed: No such file or directory
2022-05-14T00:00:57.9145946Z Adding system user `nvpd' (UID 108) ...
2022-05-14T00:00:57.9148403Z Adding new group `nvpd' (GID 114) ...
2022-05-14T00:00:57.9404618Z Adding new user `nvpd' (UID 108) with group `nvpd' ...
2022-05-14T00:00:58.0063595Z Not creating home directory `/var/run/nvpd/'.
2022-05-14T00:00:58.2563485Z Created symlink /etc/systemd/system/multi-user.target.wants/nvidia-persistenced.service → /lib/systemd/system/nvidia-persistenced.service.
2022-05-14T00:00:58.5343308Z Job for nvidia-persistenced.service failed because the control process exited with error code.
2022-05-14T00:00:58.5345041Z See "systemctl status nvidia-persistenced.service" and "journalctl -xe" for details.
2022-05-14T00:00:58.5364124Z Setting up libnvidia-opticalflow1:amd64 (515.43.04-1) ...
2022-05-14T00:00:58.5384589Z Setting up nvidia-egl-icd:amd64 (515.43.04-1) ...
2022-05-14T00:00:58.5427294Z Setting up libnvidia-encode1:amd64 (515.43.04-1) ...
2022-05-14T00:00:58.5446958Z Setting up nvidia-driver-libs:amd64 (515.43.04-1) ...
2022-05-14T00:00:58.6457793Z Processing triggers for nvidia-alternative (515.43.04-1) ...
2022-05-14T00:00:58.6815900Z update-alternatives: updating alternative /usr/lib/nvidia/current because link group nvidia has changed slave links
2022-05-14T00:00:58.6909923Z Setting up nvidia-kernel-dkms (515.43.04-1) ...
2022-05-14T00:00:58.7524183Z Loading new nvidia-current-515.43.04 DKMS files...
2022-05-14T00:00:58.8821742Z Building for 5.10.0-14-cloud-amd64
2022-05-14T00:00:58.9464342Z Building initial module for 5.10.0-14-cloud-amd64
2022-05-14T00:01:07.5776525Z Error! Bad return status for module build on kernel: 5.10.0-14-cloud-amd64 (x86_64)
2022-05-14T00:01:07.5777316Z Consult /var/lib/dkms/nvidia-current/515.43.04/build/make.log for more information.
2022-05-14T00:01:07.5823079Z dpkg: error processing package nvidia-kernel-dkms (--configure):
2022-05-14T00:01:07.5823791Z  installed nvidia-kernel-dkms package post-installation script subprocess returned error exit status 10
2022-05-14T00:01:07.5833911Z dpkg: dependency problems prevent configuration of nvidia-driver:
2022-05-14T00:01:07.5836121Z  nvidia-driver depends on nvidia-kernel-dkms (= 515.43.04-1) | nvidia-kernel-515.43.04 | nvidia-kernel-open-dkms (= 515.43.04-1); however:
2022-05-14T00:01:07.5836902Z   Package nvidia-kernel-dkms is not configured yet.
2022-05-14T00:01:07.5837418Z   Package nvidia-kernel-515.43.04 is not installed.
2022-05-14T00:01:07.5838186Z   Package nvidia-kernel-dkms which provides nvidia-kernel-515.43.04 is not configured yet.
2022-05-14T00:01:07.5838774Z   Package nvidia-kernel-open-dkms is not installed.
2022-05-14T00:01:07.5841101Z 
2022-05-14T00:01:07.5842830Z dpkg: error processing package nvidia-driver (--configure):
2022-05-14T00:01:07.5843376Z  dependency problems - leaving unconfigured
2022-05-14T00:01:07.5844028Z dpkg: dependency problems prevent configuration of cuda-drivers-515:
2022-05-14T00:01:07.5844627Z  cuda-drivers-515 depends on nvidia-driver (>= 515.43.04); however:
2022-05-14T00:01:07.5845181Z   Package nvidia-driver is not configured yet.
2022-05-14T00:01:07.5845355Z 
2022-05-14T00:01:07.5845818Z dpkg: error processing package cuda-drivers-515 (--configure):
2022-05-14T00:01:07.5846563Z  dependency problems - leaving unconfigured
2022-05-14T00:01:07.5848749Z dpkg: dependency problems prevent configuration of cuda-drivers:
2022-05-14T00:01:07.5849400Z  cuda-drivers depends on cuda-drivers-515 (= 515.43.04-1); however:
2022-05-14T00:01:07.5849942Z   Package cuda-drivers-515 is not configured yet.
2022-05-14T00:01:07.5850126Z 
2022-05-14T00:01:07.5850560Z dpkg: error processing package cuda-drivers (--configure):
2022-05-14T00:01:07.5851083Z  dependency problems - leaving unconfigured
2022-05-14T00:01:07.5851691Z Processing triggers for libgdk-pixbuf-2.0-0:amd64 (2.42.2+dfsg-1) ...
2022-05-14T00:01:07.6020275Z Processing triggers for libc-bin (2.31-13+deb11u3) ...
2022-05-14T00:01:07.6099733Z Processing triggers for initramfs-tools (0.140) ...
2022-05-14T00:01:07.6301617Z update-initramfs: Generating /boot/initrd.img-5.10.0-14-cloud-amd64
2022-05-14T00:01:10.8598813Z Processing triggers for update-glx (1.2.1~deb11u1) ...
2022-05-14T00:01:10.8662037Z Processing triggers for glx-alternative-nvidia (1.2.1~deb11u1) ...
2022-05-14T00:01:10.9032521Z update-alternatives: using /usr/lib/nvidia to provide /usr/lib/glx (glx) in auto mode
2022-05-14T00:01:10.9214112Z Processing triggers for glx-alternative-mesa (1.2.1~deb11u1) ...
2022-05-14T00:01:10.9388268Z Processing triggers for libc-bin (2.31-13+deb11u3) ...
2022-05-14T00:01:10.9458384Z Processing triggers for initramfs-tools (0.140) ...
2022-05-14T00:01:10.9660793Z update-initramfs: Generating /boot/initrd.img-5.10.0-14-cloud-amd64
2022-05-14T00:01:14.1178733Z Errors were encountered while processing:
2022-05-14T00:01:14.1179362Z  nvidia-kernel-dkms
2022-05-14T00:01:14.1179765Z  nvidia-driver
2022-05-14T00:01:14.1180124Z  cuda-drivers-515
2022-05-14T00:01:14.1180483Z  cuda-drivers
2022-05-14T00:01:14.2114500Z W: Sources disagree on hashes for supposely identical version '515.43.04-1' of 'cuda-drivers:amd64'.
2022-05-14T00:01:14.2115330Z E: Sub-process /usr/bin/dpkg returned an error code (1)
2022-05-14T00:01:14.2562150Z Reading package lists...
2022-05-14T00:01:14.4298445Z Building dependency tree...
2022-05-14T00:01:14.4306254Z Reading state information...
2022-05-14T00:01:14.7719743Z cuda-drivers is already the newest version (515.43.04-1).
2022-05-14T00:01:14.7720276Z 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
2022-05-14T00:01:14.7720667Z 4 not fully installed or removed.
2022-05-14T00:01:14.7721079Z After this operation, 0 B of additional disk space will be used.
2022-05-14T00:01:14.7789281Z Setting up nvidia-kernel-dkms (515.43.04-1) ...
2022-05-14T00:01:14.8412407Z Removing old nvidia-current-515.43.04 DKMS files...
2022-05-14T00:01:14.9367052Z 
2022-05-14T00:01:14.9368066Z ------------------------------
2022-05-14T00:01:14.9368426Z Deleting module version: 515.43.04
2022-05-14T00:01:14.9368777Z completely from the DKMS tree.
2022-05-14T00:01:14.9369203Z ------------------------------
2022-05-14T00:01:14.9958673Z Done.
2022-05-14T00:01:15.0001115Z Loading new nvidia-current-515.43.04 DKMS files...
2022-05-14T00:01:15.1126160Z Building for 5.10.0-14-cloud-amd64
2022-05-14T00:01:15.1768921Z Building initial module for 5.10.0-14-cloud-amd64
2022-05-14T00:01:23.7830700Z Error! Bad return status for module build on kernel: 5.10.0-14-cloud-amd64 (x86_64)
2022-05-14T00:01:23.7831564Z Consult /var/lib/dkms/nvidia-current/515.43.04/build/make.log for more information.
2022-05-14T00:01:23.7875018Z dpkg: error processing package nvidia-kernel-dkms (--configure):
2022-05-14T00:01:23.7875950Z  installed nvidia-kernel-dkms package post-installation script subprocess returned error exit status 10
2022-05-14T00:01:23.7876686Z dpkg: dependency problems prevent configuration of nvidia-driver:
2022-05-14T00:01:23.7877468Z  nvidia-driver depends on nvidia-kernel-dkms (= 515.43.04-1) | nvidia-kernel-515.43.04 | nvidia-kernel-open-dkms (= 515.43.04-1); however:
2022-05-14T00:01:23.7878146Z   Package nvidia-kernel-dkms is not configured yet.
2022-05-14T00:01:23.7878862Z   Package nvidia-kernel-515.43.04 is not installed.
2022-05-14T00:01:23.7879480Z   Package nvidia-kernel-dkms which provides nvidia-kernel-515.43.04 is not configured yet.
2022-05-14T00:01:23.7880044Z   Package nvidia-kernel-open-dkms is not installed.
2022-05-14T00:01:23.7880250Z 
2022-05-14T00:01:23.7880687Z dpkg: error processing package nvidia-driver (--configure):
2022-05-14T00:01:23.7881193Z  dependency problems - leaving unconfigured
2022-05-14T00:01:23.7881729Z dpkg: dependency problems prevent configuration of cuda-drivers-515:
2022-05-14T00:01:23.7882321Z  cuda-drivers-515 depends on nvidia-driver (>= 515.43.04); however:
2022-05-14T00:01:23.7882833Z   Package nvidia-driver is not configured yet.
2022-05-14T00:01:23.7883145Z 
2022-05-14T00:01:23.7883597Z dpkg: error processing package cuda-drivers-515 (--configure):
2022-05-14T00:01:23.7884117Z  dependency problems - leaving unconfigured
2022-05-14T00:01:23.7884636Z dpkg: dependency problems prevent configuration of cuda-drivers:
2022-05-14T00:01:23.7885224Z  cuda-drivers depends on cuda-drivers-515 (= 515.43.04-1); however:
2022-05-14T00:01:23.7885818Z   Package cuda-drivers-515 is not configured yet.
2022-05-14T00:01:23.7887248Z 
2022-05-14T00:01:23.7887785Z dpkg: error processing package cuda-drivers (--configure):
2022-05-14T00:01:23.7888297Z  dependency problems - leaving unconfigured
2022-05-14T00:01:23.7933488Z Errors were encountered while processing:
2022-05-14T00:01:23.7933950Z  nvidia-kernel-dkms
2022-05-14T00:01:23.7934350Z  nvidia-driver
2022-05-14T00:01:23.7934712Z  cuda-drivers-515
2022-05-14T00:01:23.7935068Z  cuda-drivers
2022-05-14T00:01:23.8892060Z E: Sub-process /usr/bin/dpkg returned an error code (1)
2022-05-14T00:01:23.9343910Z Reading package lists...
2022-05-14T00:01:24.1036640Z Building dependency tree...
2022-05-14T00:01:24.1043648Z Reading state information...
2022-05-14T00:01:24.4444783Z cuda-drivers is already the newest version (515.43.04-1).
2022-05-14T00:01:24.4445307Z 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
2022-05-14T00:01:24.4445773Z 4 not fully installed or removed.
2022-05-14T00:01:24.4446143Z After this operation, 0 B of additional disk space will be used.
2022-05-14T00:01:24.4518276Z Setting up nvidia-kernel-dkms (515.43.04-1) ...
2022-05-14T00:01:24.5124564Z Removing old nvidia-current-515.43.04 DKMS files...
2022-05-14T00:01:24.6005203Z 
2022-05-14T00:01:24.6006041Z ------------------------------
2022-05-14T00:01:24.6006389Z Deleting module version: 515.43.04
2022-05-14T00:01:24.6006709Z completely from the DKMS tree.
2022-05-14T00:01:24.6007135Z ------------------------------
2022-05-14T00:01:24.6516253Z Done.
2022-05-14T00:01:24.6562804Z Loading new nvidia-current-515.43.04 DKMS files...
2022-05-14T00:01:24.7675665Z Building for 5.10.0-14-cloud-amd64
2022-05-14T00:01:24.8316559Z Building initial module for 5.10.0-14-cloud-amd64
2022-05-14T00:01:33.4233280Z Error! Bad return status for module build on kernel: 5.10.0-14-cloud-amd64 (x86_64)
2022-05-14T00:01:33.4234055Z Consult /var/lib/dkms/nvidia-current/515.43.04/build/make.log for more information.
2022-05-14T00:01:33.4282567Z dpkg: error processing package nvidia-kernel-dkms (--configure):
2022-05-14T00:01:33.4283518Z  installed nvidia-kernel-dkms package post-installation script subprocess returned error exit status 10
2022-05-14T00:01:33.4284164Z dpkg: dependency problems prevent configuration of nvidia-driver:
2022-05-14T00:01:33.4284940Z  nvidia-driver depends on nvidia-kernel-dkms (= 515.43.04-1) | nvidia-kernel-515.43.04 | nvidia-kernel-open-dkms (= 515.43.04-1); however:
2022-05-14T00:01:33.4285614Z   Package nvidia-kernel-dkms is not configured yet.
2022-05-14T00:01:33.4286124Z   Package nvidia-kernel-515.43.04 is not installed.
2022-05-14T00:01:33.4286772Z   Package nvidia-kernel-dkms which provides nvidia-kernel-515.43.04 is not configured yet.
2022-05-14T00:01:33.4287345Z   Package nvidia-kernel-open-dkms is not installed.
2022-05-14T00:01:33.4287534Z 
2022-05-14T00:01:33.4287973Z dpkg: error processing package nvidia-driver (--configure):
2022-05-14T00:01:33.4288735Z  dependency problems - leaving unconfigured
2022-05-14T00:01:33.4289284Z dpkg: dependency problems prevent configuration of cuda-drivers-515:
2022-05-14T00:01:33.4289878Z  cuda-drivers-515 depends on nvidia-driver (>= 515.43.04); however:
2022-05-14T00:01:33.4290395Z   Package nvidia-driver is not configured yet.
2022-05-14T00:01:33.4290567Z 
2022-05-14T00:01:33.4291014Z dpkg: error processing package cuda-drivers-515 (--configure):
2022-05-14T00:01:33.4291528Z  dependency problems - leaving unconfigured
2022-05-14T00:01:33.4292081Z dpkg: dependency problems prevent configuration of cuda-drivers:
2022-05-14T00:01:33.4292837Z  cuda-drivers depends on cuda-drivers-515 (= 515.43.04-1); however:
2022-05-14T00:01:33.4293365Z   Package cuda-drivers-515 is not configured yet.
2022-05-14T00:01:33.4293537Z 
2022-05-14T00:01:33.4293973Z dpkg: error processing package cuda-drivers (--configure):
2022-05-14T00:01:33.4294486Z  dependency problems - leaving unconfigured
2022-05-14T00:01:33.4336203Z Errors were encountered while processing:
2022-05-14T00:01:33.4336680Z  nvidia-kernel-dkms
2022-05-14T00:01:33.4337063Z  nvidia-driver
2022-05-14T00:01:33.4337450Z  cuda-drivers-515
2022-05-14T00:01:33.4337800Z  cuda-drivers
2022-05-14T00:01:33.5312144Z E: Sub-process /usr/bin/dpkg returned an error code (1)
2022-05-14T00:01:33.5770870Z Reading package lists...
2022-05-14T00:01:33.7529259Z Building dependency tree...
2022-05-14T00:01:33.7537158Z Reading state information...
2022-05-14T00:01:34.1022159Z cuda-drivers is already the newest version (515.43.04-1).
2022-05-14T00:01:34.1022702Z 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
2022-05-14T00:01:34.1023099Z 4 not fully installed or removed.
2022-05-14T00:01:34.1023470Z After this operation, 0 B of additional disk space will be used.
2022-05-14T00:01:34.1092076Z Setting up nvidia-kernel-dkms (515.43.04-1) ...
2022-05-14T00:01:34.1714761Z Removing old nvidia-current-515.43.04 DKMS files...
2022-05-14T00:01:34.2645175Z 
2022-05-14T00:01:34.2646743Z ------------------------------
2022-05-14T00:01:34.2647169Z Deleting module version: 515.43.04
2022-05-14T00:01:34.2647519Z completely from the DKMS tree.
2022-05-14T00:01:34.2647953Z ------------------------------
2022-05-14T00:01:34.3221643Z Done.
2022-05-14T00:01:34.3265089Z Loading new nvidia-current-515.43.04 DKMS files...
2022-05-14T00:01:34.4393215Z Building for 5.10.0-14-cloud-amd64
2022-05-14T00:01:34.5036280Z Building initial module for 5.10.0-14-cloud-amd64
2022-05-14T00:01:43.1616013Z Error! Bad return status for module build on kernel: 5.10.0-14-cloud-amd64 (x86_64)
2022-05-14T00:01:43.1616772Z Consult /var/lib/dkms/nvidia-current/515.43.04/build/make.log for more information.
2022-05-14T00:01:43.1667641Z dpkg: error processing package nvidia-kernel-dkms (--configure):
2022-05-14T00:01:43.1668379Z  installed nvidia-kernel-dkms package post-installation script subprocess returned error exit status 10
2022-05-14T00:01:43.1669055Z dpkg: dependency problems prevent configuration of nvidia-driver:
2022-05-14T00:01:43.1669845Z  nvidia-driver depends on nvidia-kernel-dkms (= 515.43.04-1) | nvidia-kernel-515.43.04 | nvidia-kernel-open-dkms (= 515.43.04-1); however:
2022-05-14T00:01:43.1670513Z   Package nvidia-kernel-dkms is not configured yet.
2022-05-14T00:01:43.1671059Z   Package nvidia-kernel-515.43.04 is not installed.
2022-05-14T00:01:43.1671670Z   Package nvidia-kernel-dkms which provides nvidia-kernel-515.43.04 is not configured yet.
2022-05-14T00:01:43.1672251Z   Package nvidia-kernel-open-dkms is not installed.
2022-05-14T00:01:43.1672435Z 
2022-05-14T00:01:43.1672875Z dpkg: error processing package nvidia-driver (--configure):
2022-05-14T00:01:43.1673386Z  dependency problems - leaving unconfigured
2022-05-14T00:01:43.1673926Z dpkg: dependency problems prevent configuration of cuda-drivers-515:
2022-05-14T00:01:43.1674514Z  cuda-drivers-515 depends on nvidia-driver (>= 515.43.04); however:
2022-05-14T00:01:43.1675071Z   Package nvidia-driver is not configured yet.
2022-05-14T00:01:43.1675539Z 
2022-05-14T00:01:43.1676011Z dpkg: error processing package cuda-drivers-515 (--configure):
2022-05-14T00:01:43.1676519Z  dependency problems - leaving unconfigured
2022-05-14T00:01:43.1677052Z dpkg: dependency problems prevent configuration of cuda-drivers:
2022-05-14T00:01:43.1677632Z  cuda-drivers depends on cuda-drivers-515 (= 515.43.04-1); however:
2022-05-14T00:01:43.1678172Z   Package cuda-drivers-515 is not configured yet.
2022-05-14T00:01:43.1678344Z 
2022-05-14T00:01:43.1678775Z dpkg: error processing package cuda-drivers (--configure):
2022-05-14T00:01:43.1679269Z  dependency problems - leaving unconfigured
2022-05-14T00:01:43.1727278Z Errors were encountered while processing:
2022-05-14T00:01:43.1727774Z  nvidia-kernel-dkms
2022-05-14T00:01:43.1728146Z  nvidia-driver
2022-05-14T00:01:43.1728513Z  cuda-drivers-515
2022-05-14T00:01:43.1728864Z  cuda-drivers
2022-05-14T00:01:43.2713624Z E: Sub-process /usr/bin/dpkg returned an error code (1)
2022-05-14T00:01:43.3192305Z Reading package lists...
2022-05-14T00:01:43.4960752Z Building dependency tree...
2022-05-14T00:01:43.4968827Z Reading state information...
2022-05-14T00:01:43.8462251Z cuda-drivers is already the newest version (515.43.04-1).
2022-05-14T00:01:43.8462734Z 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
2022-05-14T00:01:43.8463098Z 4 not fully installed or removed.
2022-05-14T00:01:43.8463479Z After this operation, 0 B of additional disk space will be used.
2022-05-14T00:01:43.8539244Z Setting up nvidia-kernel-dkms (515.43.04-1) ...
2022-05-14T00:01:43.9163546Z Removing old nvidia-current-515.43.04 DKMS files...
2022-05-14T00:01:44.0124644Z 
2022-05-14T00:01:44.0125286Z ------------------------------
2022-05-14T00:01:44.0125620Z Deleting module version: 515.43.04
2022-05-14T00:01:44.0125941Z completely from the DKMS tree.
2022-05-14T00:01:44.0126370Z ------------------------------
2022-05-14T00:01:44.0695568Z Done.
2022-05-14T00:01:44.0737205Z Loading new nvidia-current-515.43.04 DKMS files...
2022-05-14T00:01:44.1872168Z Building for 5.10.0-14-cloud-amd64
2022-05-14T00:01:44.2515311Z Building initial module for 5.10.0-14-cloud-amd64
2022-05-14T00:01:52.9023084Z Error! Bad return status for module build on kernel: 5.10.0-14-cloud-amd64 (x86_64)
2022-05-14T00:01:52.9023907Z Consult /var/lib/dkms/nvidia-current/515.43.04/build/make.log for more information.
2022-05-14T00:01:52.9072488Z dpkg: error processing package nvidia-kernel-dkms (--configure):
2022-05-14T00:01:52.9073218Z  installed nvidia-kernel-dkms package post-installation script subprocess returned error exit status 10
2022-05-14T00:01:52.9074196Z dpkg: dependency problems prevent configuration of nvidia-driver:
2022-05-14T00:01:52.9075040Z  nvidia-driver depends on nvidia-kernel-dkms (= 515.43.04-1) | nvidia-kernel-515.43.04 | nvidia-kernel-open-dkms (= 515.43.04-1); however:
2022-05-14T00:01:52.9075845Z   Package nvidia-kernel-dkms is not configured yet.
2022-05-14T00:01:52.9076363Z   Package nvidia-kernel-515.43.04 is not installed.
2022-05-14T00:01:52.9077014Z   Package nvidia-kernel-dkms which provides nvidia-kernel-515.43.04 is not configured yet.
2022-05-14T00:01:52.9077586Z   Package nvidia-kernel-open-dkms is not installed.
2022-05-14T00:01:52.9077763Z 
2022-05-14T00:01:52.9078208Z dpkg: error processing package nvidia-driver (--configure):
2022-05-14T00:01:52.9078720Z  dependency problems - leaving unconfigured
2022-05-14T00:01:52.9079260Z dpkg: dependency problems prevent configuration of cuda-drivers-515:
2022-05-14T00:01:52.9079875Z  cuda-drivers-515 depends on nvidia-driver (>= 515.43.04); however:
2022-05-14T00:01:52.9080385Z   Package nvidia-driver is not configured yet.
2022-05-14T00:01:52.9080554Z 
2022-05-14T00:01:52.9081010Z dpkg: error processing package cuda-drivers-515 (--configure):
2022-05-14T00:01:52.9081507Z  dependency problems - leaving unconfigured
2022-05-14T00:01:52.9082027Z dpkg: dependency problems prevent configuration of cuda-drivers:
2022-05-14T00:01:52.9082602Z  cuda-drivers depends on cuda-drivers-515 (= 515.43.04-1); however:
2022-05-14T00:01:52.9083369Z   Package cuda-drivers-515 is not configured yet.
2022-05-14T00:01:52.9083554Z 
2022-05-14T00:01:52.9083978Z dpkg: error processing package cuda-drivers (--configure):
2022-05-14T00:01:52.9084479Z  dependency problems - leaving unconfigured
2022-05-14T00:01:52.9132535Z Errors were encountered while processing:
2022-05-14T00:01:52.9133011Z  nvidia-kernel-dkms
2022-05-14T00:01:52.9133369Z  nvidia-driver
2022-05-14T00:01:52.9133736Z  cuda-drivers-515
2022-05-14T00:01:52.9134085Z  cuda-drivers
2022-05-14T00:01:53.0105790Z E: Sub-process /usr/bin/dpkg returned an error code (1)
2022-05-14T00:01:53.0128828Z [ERROR] Failed to install packages.
2022-05-14T00:01:53.0181684Z ##[error]Bash exited with code '1'.
2022-05-14T00:01:53.0198482Z ##[section]Finishing: Install GPU driver

Given Azure doesn't support GSP, we had to inject NVreg_EnableGpuFirmware=0 previously after installation of 510.

sudo modprobe -r nvidia{,_{modeset,uvm}} || true
sudo modprobe nvidia{,_{modeset,uvm}} NVreg_EnableGpuFirmware=0

Because based on what I heard, the new release (or at least the open source part) requires GSP. I wonder if that's causing the issue.

To Reproduce

PAR2020 commented 2 years ago

@xkszltl thanks for reporting this issue. We are confirming test status on Azure and will get back to you shortly.

PAR2020 commented 2 years ago

@xkszltl, can you please try this again with the new driver that was dropped and report back? I believe Azure has completed their roll-out of the new 515 drivers. Thanks.

xkszltl commented 2 years ago

Thanks! Will find sometime next week to give it a try.

PAR2020 commented 2 years ago

@xkszltl checking in on this issue...any luck?

xkszltl commented 2 years ago

Sorry for the long wait, yes it works.

During the past months we had several blocking issue and that's why it took so long to confirm.

Would be great if nvidia can make that experience better in general.