Closed namupatel closed 2 years ago
Hi @namupatel Can you please indicate which NVIDIA GPU model/SKU you are using?
This week, we released CUDA 11.5.0 with NVIDIA driver 495.29.05 https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/precompiled/
If you were on the latest
stream, you would have been upgraded to 495
now.
However, the last driver branch that supports [many] Kepler GPUs is 470
. If that is the case, then I suggest: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#removing-cuda-tk-and-driver
sudo dnf remove nvidia-driver
sudo dnf module reset nvidia-driver
sudo dnf module install nvidia-driver:470
That will keep you on the precompiled 470 driver branch, which should be supported with updates for a very long time.
More information about this:
NVIDIA Driver support for Kepler is removed beginning with R495. CUDA Toolkit development support for Kepler continues through CUDA 11.x.
R470 Long Term Support Branch EOL: July 2024
Anyway, if that's not the case and you are still on 470.57.02
and RHEL 8.4 kernel 4.18.0-305.19.1
then I will need take another look.
I tested the installation with my GTX 650 (Kepler GPU) with dnf module install nvidia-driver:470
with kernel 4.18.0-305.19.1
and after rebooting, GNOME desktop works just fine on my system.
$ lspci | grep VGA
01:00.0 VGA compatible controller: NVIDIA Corporation GK107 [GeForce GTX 650] (rev a1)
$ lsmod | grep -e nouveau -e nvidia
nvidia_drm 57344 6
nvidia_modeset 1155072 13 nvidia_drm
nvidia_uvm 1069056 0
nvidia 34709504 682 nvidia_uvm,nvidia_modeset
drm_kms_helper 233472 2 nvidia_drm,i915
drm 569344 12 drm_kms_helper,nvidia,nvidia_drm,i915
$ rpm -qa | grep nvidia | sort
dnf-plugin-nvidia-2.0-1.el8.noarch
kmod-nvidia-470.57.02-4.18.0-305.19.1-470.57.02-3.el8_4.x86_64
nvidia-driver-470.57.02-1.el8.x86_64
nvidia-driver-cuda-470.57.02-1.el8.x86_64
nvidia-driver-cuda-libs-470.57.02-1.el8.x86_64
nvidia-driver-devel-470.57.02-1.el8.x86_64
nvidia-driver-libs-470.57.02-1.el8.x86_64
nvidia-driver-NvFBCOpenGL-470.57.02-1.el8.x86_64
nvidia-driver-NVML-470.57.02-1.el8.x86_64
nvidia-kmod-common-470.57.02-1.el8.noarch
nvidia-libXNVCtrl-470.57.02-1.el8.x86_64
nvidia-libXNVCtrl-devel-470.57.02-1.el8.x86_64
nvidia-modprobe-470.57.02-1.el8.x86_64
nvidia-persistenced-470.57.02-1.el8.x86_64
nvidia-settings-470.57.02-1.el8.x86_64
nvidia-xconfig-470.57.02-1.el8.x86_64
$ rpm -qa | grep kernel | grep $(uname -r) | sort
kernel-4.18.0-305.19.1.el8_4.x86_64
kernel-core-4.18.0-305.19.1.el8_4.x86_64
kernel-modules-4.18.0-305.19.1.el8_4.x86_64
kernel-tools-4.18.0-305.19.1.el8_4.x86_64
kernel-tools-libs-4.18.0-305.19.1.el8_4.x86_64
$ sudo dnf nvidia-plugin
installed kernel: kernel-4.18.0-305.19.1.el8_4.x86_64
installed kmod(s): kmod-nvidia-470.57.02-4.18.0-305.19.1-3:470.57.02-3.el8_4.x86_64
$ sudo dnf module list nvidia-driver
Last metadata expiration check: 0:10:02 ago on Fri 22 Oct 2021 01:51:53 PM PDT.
cuda-rhel8-x86_64
Name Stream Profiles Summary
nvidia-driver latest default [d], fm, ks, src Nvidia driver for latest branch
nvidia-driver latest-dkms [d] default [d], fm, ks Nvidia driver for latest-dkms branch
nvidia-driver 418 default [d], fm, ks, src Nvidia driver for 418 branch
nvidia-driver 418-dkms default [d], fm, ks Nvidia driver for 418-dkms branch
nvidia-driver 440 default [d], fm, ks, src Nvidia driver for 440 branch
nvidia-driver 440-dkms default [d], fm, ks Nvidia driver for 440-dkms branch
nvidia-driver 450 default [d], fm, ks, src Nvidia driver for 450 branch
nvidia-driver 450-dkms default [d], fm, ks Nvidia driver for 450-dkms branch
nvidia-driver 455 default [d], fm, ks, src Nvidia driver for 455 branch
nvidia-driver 455-dkms default [d], fm, ks Nvidia driver for 455-dkms branch
nvidia-driver 460 default [d], fm, ks, src Nvidia driver for 460 branch
nvidia-driver 460-dkms default [d], fm, ks Nvidia driver for 460-dkms branch
nvidia-driver 465 default [d], fm, ks, src Nvidia driver for 465 branch
nvidia-driver 465-dkms default [d], fm, ks Nvidia driver for 465-dkms branch
nvidia-driver 470 [e] default [d] [i], fm, ks, src Nvidia driver for 470 branch
nvidia-driver 470-dkms default [d], fm, ks Nvidia driver for 470-dkms branch
nvidia-driver 495 default [d], fm, ks, src Nvidia driver for 495 branch
nvidia-driver 495-dkms default [d], fm, ks Nvidia driver for 495-dkms branch
Hint: [d]efault, [e]nabled, [x]disabled, [i]nstalled
$ glxgears
300 frames in 5.0 seconds = 59.851 FPS
301 frames in 5.0 seconds = 60.002 FPS
Hi @kmittman,
Thanks for following-up. I'm using Tesla K40c (NVIDIA Corporation GK180GL). I've switched over to driver version 470 on RedHat 8.4 kernel 4.18.0-305.19.1.el8_4 and reinstalled CUDA 11.1. CUDA code is successfully running. I'm not in front of the machine today, but will check tomorrow if graphics are up now.
Looking at the checks you ran, I see that the VGA GPU listed in your case is the GTX 650. My machine has 2 GPUs which might be causing the problem (will verify if graphics are up tomorrow and run the glxgears test):
$ lspci | grep NVIDIA
83:00.0 3D controller: NVIDIA Corporation GK180GL [Tesla K40c] (rev a1)
84:00.0 VGA compatible controller: NVIDIA Corporation GK106GL [Quadro K4000] (rev a1)
84:00.1 Audio device: NVIDIA Corporation GK106 HDMI Audio Controller (rev a1)
$ lsmod | grep -e nouveau -e nvidia
nvidia_drm 57344 6
nvidia_modeset 1155072 4 nvidia_drm
nvidia_uvm 1069056 0
nvidia 34709504 197 nvidia_uvm,nvidia_modeset
drm_kms_helper 233472 1 nvidia_drm
drm 569344 10 drm_kms_helper,nvidia,nvidia_drm
$ rpm -qa | grep nvidia | sort
dnf-plugin-nvidia-2.0-1.el8.noarch
kmod-nvidia-470.57.02-4.18.0-305.19.1-470.57.02-3.el8_4.x86_64
nvidia-driver-470.57.02-1.el8.x86_64
nvidia-driver-cuda-470.57.02-1.el8.x86_64
nvidia-driver-cuda-libs-470.57.02-1.el8.x86_64
nvidia-driver-devel-470.57.02-1.el8.x86_64
nvidia-driver-libs-470.57.02-1.el8.x86_64
nvidia-driver-NvFBCOpenGL-470.57.02-1.el8.x86_64
nvidia-driver-NVML-470.57.02-1.el8.x86_64
nvidia-kmod-common-470.57.02-1.el8.noarch
nvidia-libXNVCtrl-470.57.02-1.el8.x86_64
nvidia-libXNVCtrl-devel-470.57.02-1.el8.x86_64
nvidia-modprobe-470.57.02-1.el8.x86_64
nvidia-persistenced-470.57.02-1.el8.x86_64
nvidia-settings-470.57.02-1.el8.x86_64
nvidia-xconfig-470.57.02-1.el8.x86_64
$ rpm -qa | grep kernel | grep $(uname -r) | sort
kernel-4.18.0-305.19.1.el8_4.x86_64
kernel-core-4.18.0-305.19.1.el8_4.x86_64
kernel-devel-4.18.0-305.19.1.el8_4.x86_64
kernel-headers-4.18.0-305.19.1.el8_4.x86_64
kernel-modules-4.18.0-305.19.1.el8_4.x86_64
kernel-tools-4.18.0-305.19.1.el8_4.x86_64
kernel-tools-libs-4.18.0-305.19.1.el8_4.x86_64
$ sudo dnf nvidia-plugin
installed kernel: kernel-4.18.0-305.19.1.el8_4.x86_64
installed kmod(s): kmod-nvidia-470.57.02-4.18.0-305.19.1-3:470.57.02-3.el8_4.x86_64
$ sudo dnf module list nvidia-driver
Updating Subscription Management repositories.
Last metadata expiration check: 3:20:53 ago on Mon 25 Oct 2021 06:24:22 AM EDT.
cuda-rhel8-x86_64
Name Stream Profiles Summary
nvidia-driver latest default [d], fm, ks, src Nvidia driver for latest branch
nvidia-driver latest-dkms [d] default [d], fm, ks Nvidia driver for latest-dkms branc
h
nvidia-driver 418 default [d], fm, ks, src Nvidia driver for 418 branch
nvidia-driver 418-dkms default [d], fm, ks Nvidia driver for 418-dkms branch
nvidia-driver 440 default [d], fm, ks, src Nvidia driver for 440 branch
nvidia-driver 440-dkms default [d], fm, ks Nvidia driver for 440-dkms branch
nvidia-driver 450 default [d], fm, ks, src Nvidia driver for 450 branch
nvidia-driver 450-dkms default [d], fm, ks Nvidia driver for 450-dkms branch
nvidia-driver 455 default [d], fm, ks, src Nvidia driver for 455 branch
nvidia-driver 455-dkms default [d], fm, ks Nvidia driver for 455-dkms branch
nvidia-driver 460 default [d], fm, ks, src Nvidia driver for 460 branch
nvidia-driver 460-dkms default [d], fm, ks Nvidia driver for 460-dkms branch
nvidia-driver 465 default [d], fm, ks, src Nvidia driver for 465 branch
nvidia-driver 465-dkms default [d], fm, ks Nvidia driver for 465-dkms branch
nvidia-driver 470 [e] default [d] [i], fm, ks, sr Nvidia driver for 470 branch
c
nvidia-driver 470-dkms default [d], fm, ks Nvidia driver for 470-dkms branch
nvidia-driver 495 default [d], fm, ks, src Nvidia driver for 495 branch
nvidia-driver 495-dkms default [d], fm, ks Nvidia driver for 495-dkms branch
Hint: [d]efault, [e]nabled, [x]disabled, [i]nstalled
$ glxgears
57 frames in 5.1 seconds = 11.284 FPS
The terminal was blank so I rebooted. After selecting the OS version to boot I briefly saw a gray screen with three dots before the display blanked. Any more suggestions as to what might help? Thanks.
Hi @namupatel The three dots are the Plymouth bootsplash in fallback mode, normally it would display the distro's logo.
I consulted with our driver team and the hypothesis is the X display server is starting on the headless display (the Tesla SKU has no VGA/DVI/HDMI/DP output). Plymouth splash appearing momentarily on the K4000 seems to indicate this is the case.
One way to solve this is by explicitly adding the BusID of the Quadro GPU to the /etc/X11/xorg.conf
file, see: https://stackoverflow.com/a/18382758
In your case, that should be like 0x84
-> 132
Section "Device"
Identifier "Device1"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BusID "PCI:132:0:0"
EndSection
Alternatively, you can use nvidia-xconfig
with the --busid=
and --device=
parameters to generate the configuration.
If that does not work, then please attach a nvidia-bug-report.log file, generated using nvidia-bug-report.sh
@namupatel it's been awhile so closing this. Feel free to re-open if you are still experiencing this issue.
Thanks for simplifying driver installation process. Unfortunately my display is blank after the installation. I can log in remotely and successfully run nvidia-smi and execute CUDA scripts. I'm a bit of a newbie so guidance would be appreciated.
NVIDIA driver version: 470.57.02 RHEL kernel version: 8.4 modularity stream: latest modularity profile: default