Closed zt64 closed 1 year ago
Seems to have been an issue with suspending to ram. Following the steps on the arch wiki for nvidia has fixed it
I have the exact same output of vainfo
. And This happens without suspension or anything.
With direct backend enabled:
$ NVD_LOG=1 vainfo
Trying display: wayland
Trying display: x11
294.224352307 [3513-3513] ../src/vabackend.c: 138 init CUDA ERROR 'unknown error' (999)
294.224366017 [3513-3513] ../src/vabackend.c:2169 __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver: 10
294.224367815 [3513-3513] ../src/vabackend.c:2178 __vaDriverInit_1_0 Now have 0 (0 max) instances
294.224369090 [3513-3513] ../src/vabackend.c:2204 __vaDriverInit_1_0 Selecting Direct backend
294.226877916 [3513-3513] ../src/direct/direct-export-buf.c: 85 direct_initExporter Found NVIDIA GPU 0 at /dev/dri/renderD128
294.226881717 [3513-3513] ../src/direct/nv-driver.c: 217 init_nvdriver Initing nvdriver...
294.226884040 [3513-3513] ../src/direct/nv-driver.c: 222 init_nvdriver Got dev info: 100 1 2 6
294.231135547 [3513-3513] ../src/direct/nv-driver.c: 283 init_nvdriver NVIDIA kernel driver version: 530.41.03
294.231142346 [3513-3513] ../src/direct/direct-export-buf.c: 23 findGPUIndexFromFd CUDA ERROR 'initialization error' (3)
294.231144218 [3513-3513] ../src/vabackend.c:2233 __vaDriverInit_1_0 CUDA ERROR 'initialization error' (3)
libva error: /usr/lib/dri/nvidia_drv_video.so init failed
vaInitialize failed with error code 1 (operation failed),exit
here with egl backend enabled:
$ NVD_LOG=1 NVD_BACKEND=egl vainfo
Trying display: wayland
Trying display: x11
821.408317746 [3870-3870] ../src/vabackend.c: 138 init CUDA ERROR 'unknown error' (999)
821.408334850 [3870-3870] ../src/vabackend.c:2169 __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver: 10
821.408336552 [3870-3870] ../src/vabackend.c:2178 __vaDriverInit_1_0 Now have 0 (0 max) instances
821.408337685 [3870-3870] ../src/vabackend.c:2201 __vaDriverInit_1_0 Selecting EGL backend
821.410746637 [3870-3870] ../src/export-buf.c: 132 findGPUIndexFromFd Defaulting to CUDA GPU ID 0. Use NVD_GPU to select a specific CUDA GPU
821.410750675 [3870-3870] ../src/export-buf.c: 149 findGPUIndexFromFd Looking for GPU index: 0
821.416284962 [3870-3870] ../src/export-buf.c: 161 findGPUIndexFromFd Found 3 EGL devices
821.416991459 [3870-3870] ../src/export-buf.c: 213 findGPUIndexFromFd No EGL_CUDA_DEVICE_NV support for EGLDevice 0
821.416998188 [3870-3870] ../src/export-buf.c: 213 findGPUIndexFromFd No EGL_CUDA_DEVICE_NV support for EGLDevice 1
821.417000805 [3870-3870] ../src/export-buf.c: 216 findGPUIndexFromFd No DRM device file for EGLDevice 2
821.417001905 [3870-3870] ../src/export-buf.c: 219 findGPUIndexFromFd No match found, falling back to default device
libva error: /usr/lib/dri/nvidia_drv_video.so init failed
vaInitialize failed with error code 1 (operation failed),exit
graphics card:
$ lspci | grep VGA
01:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] (rev a1)
Installed nvidia packages (Arch Linux):
$ pacman -Qs nvidia
local/egl-wayland 2:1.1.11-3
EGLStream-based Wayland external platform
local/ffnvcodec-headers 12.0.16.0-1
FFmpeg version of headers required to interface with Nvidias codec APIs
local/lib32-libvdpau 1.5-1
Nvidia VDPAU library
local/lib32-nvidia-utils 530.41.03-1
NVIDIA drivers utilities (32-bit)
local/libvdpau 1.5-1
Nvidia VDPAU library
local/libxnvctrl 530.41.03-1
NVIDIA NV-CONTROL X extension
local/nvidia-open-dkms 530.41.03-1
NVIDIA open kernel modules
local/nvidia-settings 530.41.03-1
Tool for configuring the NVIDIA graphics driver
local/nvidia-tweaks 525-2
A collection of tweaks and improvements to the NVIDIA driver
local/nvidia-utils 530.41.03-1
NVIDIA drivers utilities
local/nvidia-vaapi-driver-git 0.0.9.r7.gc0a7f54-1
A VA-API implemention using NVIDIA's NVDEC
EDIT:
Part of journal log:
$ journalctl -b | grep nvidia-modeset
Apr 01 13:59:31 Arch-Linux kernel: nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64 530.41.03 Release Build (archlinux-builder@Arch-Linux)
Apr 01 13:59:31 Arch-Linux (udev-worker)[366]: nvidia_modeset: Process '/usr/bin/bash -c '/usr/bin/mknod -Z -m 666 /dev/nvidia-modeset c $(grep nvidia-frontend /proc/devices | cut -d \ -f 1) 254'' failed with exit code 1.
Trying to solve this issue since hours but I'm absolutely clueless what I should do.
Ok, the issue is almost certainly caused by that failed nvidia_modeset line. Can you check your NVIDIA devices in /dev, they should look like this:
> ls -l /dev/nvidia*
crw-rw-rw- 1 root root 195, 0 Apr 1 08:00 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Apr 1 08:00 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Apr 1 08:00 /dev/nvidia-modeset
crw-rw-rw- 1 root root 234, 0 Apr 1 08:18 /dev/nvidia-uvm
crw-rw-rw- 1 root root 234, 1 Apr 1 08:18 /dev/nvidia-uvm-tools
The output of your command looks like this:
$ ls -l /dev/nvidia*
crw-rw-rw- 195,254 root 1 Apr 13:59 /dev/nvidia-modeset
crw-rw-rw- 236,0 root 1 Apr 13:59 /dev/nvidia-uvm
crw-rw-rw- 236,1 root 1 Apr 13:59 /dev/nvidia-uvm-tools
crw-rw-rw- 195,0 root 1 Apr 13:59 /dev/nvidia0
crw-rw-rw- 195,255 root 1 Apr 13:59 /dev/nvidiactl
/dev/nvidia-caps:
cr-------- 239,1 root 1 Apr 14:29 nvidia-cap1
cr--r--r-- 239,2 root 1 Apr 14:29 nvidia-cap2
These are my kernel parameters atm, because "nvidia_drm.modeset=1" is definitely set, don't know why mknod fails:
$ cat /proc/cmdline
File: /proc/cmdline
rw root=PARTLABEL=RebornOS nvme_load=yes add_efi_memmap nvidia_drm.modeset=1 NVreg_EnableResizableBar=1 NVreg_EnableGpuFirmware=1 acpi_osi=! "acpi_osi=Windows 2015" quiet splash loglevel=3 r
d.udev.log_priority=3 vt.global_cursor_default=0 initrd=intel-ucode.img initrd=initramfs-linux-clear.img
mkinitcpio.conf (modules):
$ sed -n 7p /etc/mkinicpio.conf
MODULES=(r8125 nvidia nvidia_modeset nvidia_uvm nvidia_drm)
Just noticed you have the open source version of the driver installed, correct? If so, I'm not sure that works with CUDA.
Yes I got the open version isntalled. I just tested something out though. I think it's a kernel-related issue. I'm using the clear-linux kernel, with which the vainfo throws the error. The session I booted up before was with the zen-linux kernel. With this one i get a positive output. Just don't know why the heck the kernel should be a problem, since I'm a kernel and driver-noob. 😅 I will just use the zen-kernel instead then, until it may or may not be working with the clear-linux kernel.
Even tho' the issue is closed I'd like to add that "init_on_alloc=0" as kernel option causes this exact error.
Hi,
I'm trying to get the driver working but I saw that vainfo refuses to work. I'm using a 3070 on Arch and I have my /etc/environment file set up like this: