elFarto / nvidia-vaapi-driver

A VA-API implemention using NVIDIA's NVDEC
Other
1.18k stars 53 forks source link

Error when using vainfo #194

Closed zt64 closed 1 year ago

zt64 commented 1 year ago

Hi,

I'm trying to get the driver working but I saw that vainfo refuses to work. I'm using a 3070 on Arch and I have my /etc/environment file set up like this:

__GL_SYNC_TO_VBLANK=1
__GL_SYNC_DISPLAY_DEVICE=DP-0
__GLX_VENDOR_LIBRARY_NAME=nvidia
VDPAU_NVIDIA_SYNC_DISPLAY_DEVICE=DP-0
LIBVA_DRIVER_NAME=nvidia
NVD_BACKEND=direct
NVD_LOG=1 vainfo
Trying display: wayland
Trying display: x11
     35432.729795358 [189367-189367] ../src/vabackend.c: 138                     init CUDA ERROR 'unknown error' (999)

     35432.729821699 [189367-189367] ../src/vabackend.c:2169       __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver: 10
     35432.729824434 [189367-189367] ../src/vabackend.c:2178       __vaDriverInit_1_0 Now have 0 (0 max) instances
     35432.729826700 [189367-189367] ../src/vabackend.c:2204       __vaDriverInit_1_0 Selecting Direct backend
     35432.733460879 [189367-189367] ../src/direct/direct-export-buf.c:  85      direct_initExporter Found NVIDIA GPU 0 at /dev/dri/renderD128
     35432.733466576 [189367-189367] ../src/direct/nv-driver.c: 217            init_nvdriver Initing nvdriver...
     35432.733468679 [189367-189367] ../src/direct/nv-driver.c: 222            init_nvdriver Got dev info: 100 1 2 6
     35432.733522538 [189367-189367] ../src/direct/nv-driver.c: 283            init_nvdriver NVIDIA kernel driver version: 530.41.03
     35432.733530796 [189367-189367] ../src/direct/direct-export-buf.c:  23       findGPUIndexFromFd CUDA ERROR 'initialization error' (3)

     35432.733532955 [189367-189367] ../src/vabackend.c:2233       __vaDriverInit_1_0 CUDA ERROR 'initialization error' (3)

libva error: /usr/lib/dri/nvidia_drv_video.so init failed
vaInitialize failed with error code 1 (operation failed),exit
zt64 commented 1 year ago

Seems to have been an issue with suspending to ram. Following the steps on the arch wiki for nvidia has fixed it

ReFleXzZ commented 1 year ago

I have the exact same output of vainfo. And This happens without suspension or anything.

With direct backend enabled:

$ NVD_LOG=1 vainfo

Trying display: wayland
Trying display: x11
       294.224352307 [3513-3513] ../src/vabackend.c: 138                     init CUDA ERROR 'unknown error' (999)

       294.224366017 [3513-3513] ../src/vabackend.c:2169       __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver: 10
       294.224367815 [3513-3513] ../src/vabackend.c:2178       __vaDriverInit_1_0 Now have 0 (0 max) instances
       294.224369090 [3513-3513] ../src/vabackend.c:2204       __vaDriverInit_1_0 Selecting Direct backend
       294.226877916 [3513-3513] ../src/direct/direct-export-buf.c:  85      direct_initExporter Found NVIDIA GPU 0 at /dev/dri/renderD128
       294.226881717 [3513-3513] ../src/direct/nv-driver.c: 217            init_nvdriver Initing nvdriver...
       294.226884040 [3513-3513] ../src/direct/nv-driver.c: 222            init_nvdriver Got dev info: 100 1 2 6
       294.231135547 [3513-3513] ../src/direct/nv-driver.c: 283            init_nvdriver NVIDIA kernel driver version: 530.41.03
       294.231142346 [3513-3513] ../src/direct/direct-export-buf.c:  23       findGPUIndexFromFd CUDA ERROR 'initialization error' (3)

       294.231144218 [3513-3513] ../src/vabackend.c:2233       __vaDriverInit_1_0 CUDA ERROR 'initialization error' (3)

libva error: /usr/lib/dri/nvidia_drv_video.so init failed
vaInitialize failed with error code 1 (operation failed),exit

here with egl backend enabled:

$ NVD_LOG=1 NVD_BACKEND=egl vainfo

Trying display: wayland
Trying display: x11
       821.408317746 [3870-3870] ../src/vabackend.c: 138                     init CUDA ERROR 'unknown error' (999)

       821.408334850 [3870-3870] ../src/vabackend.c:2169       __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver: 10
       821.408336552 [3870-3870] ../src/vabackend.c:2178       __vaDriverInit_1_0 Now have 0 (0 max) instances
       821.408337685 [3870-3870] ../src/vabackend.c:2201       __vaDriverInit_1_0 Selecting EGL backend
       821.410746637 [3870-3870] ../src/export-buf.c: 132       findGPUIndexFromFd Defaulting to CUDA GPU ID 0. Use NVD_GPU to select a specific CUDA GPU
       821.410750675 [3870-3870] ../src/export-buf.c: 149       findGPUIndexFromFd Looking for GPU index: 0
       821.416284962 [3870-3870] ../src/export-buf.c: 161       findGPUIndexFromFd Found 3 EGL devices
       821.416991459 [3870-3870] ../src/export-buf.c: 213       findGPUIndexFromFd No EGL_CUDA_DEVICE_NV support for EGLDevice 0
       821.416998188 [3870-3870] ../src/export-buf.c: 213       findGPUIndexFromFd No EGL_CUDA_DEVICE_NV support for EGLDevice 1
       821.417000805 [3870-3870] ../src/export-buf.c: 216       findGPUIndexFromFd No DRM device file for EGLDevice 2
       821.417001905 [3870-3870] ../src/export-buf.c: 219       findGPUIndexFromFd No match found, falling back to default device
libva error: /usr/lib/dri/nvidia_drv_video.so init failed
vaInitialize failed with error code 1 (operation failed),exit

graphics card:

$ lspci | grep VGA

01:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] (rev a1)

Installed nvidia packages (Arch Linux):

$ pacman -Qs nvidia

local/egl-wayland 2:1.1.11-3
    EGLStream-based Wayland external platform
local/ffnvcodec-headers 12.0.16.0-1
    FFmpeg version of headers required to interface with Nvidias codec APIs
local/lib32-libvdpau 1.5-1
    Nvidia VDPAU library
local/lib32-nvidia-utils 530.41.03-1
    NVIDIA drivers utilities (32-bit)
local/libvdpau 1.5-1
    Nvidia VDPAU library
local/libxnvctrl 530.41.03-1
    NVIDIA NV-CONTROL X extension
local/nvidia-open-dkms 530.41.03-1
    NVIDIA open kernel modules
local/nvidia-settings 530.41.03-1
    Tool for configuring the NVIDIA graphics driver
local/nvidia-tweaks 525-2
    A collection of tweaks and improvements to the NVIDIA driver
local/nvidia-utils 530.41.03-1
    NVIDIA drivers utilities
local/nvidia-vaapi-driver-git 0.0.9.r7.gc0a7f54-1
    A VA-API implemention using NVIDIA's NVDEC

EDIT:

Part of journal log:

$ journalctl -b | grep nvidia-modeset

Apr 01 13:59:31 Arch-Linux kernel: nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64  530.41.03  Release Build  (archlinux-builder@Arch-Linux)  
Apr 01 13:59:31 Arch-Linux (udev-worker)[366]: nvidia_modeset: Process '/usr/bin/bash -c '/usr/bin/mknod -Z -m 666 /dev/nvidia-modeset c $(grep nvidia-frontend /proc/devices | cut -d \  -f 1) 254'' failed with exit code 1.

Trying to solve this issue since hours but I'm absolutely clueless what I should do.

elFarto commented 1 year ago

Ok, the issue is almost certainly caused by that failed nvidia_modeset line. Can you check your NVIDIA devices in /dev, they should look like this:

> ls -l /dev/nvidia*
crw-rw-rw- 1 root root 195,   0 Apr  1 08:00 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Apr  1 08:00 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Apr  1 08:00 /dev/nvidia-modeset
crw-rw-rw- 1 root root 234,   0 Apr  1 08:18 /dev/nvidia-uvm
crw-rw-rw- 1 root root 234,   1 Apr  1 08:18 /dev/nvidia-uvm-tools
ReFleXzZ commented 1 year ago

The output of your command looks like this:

$ ls -l /dev/nvidia*
crw-rw-rw- 195,254 root  1 Apr 13:59  /dev/nvidia-modeset
crw-rw-rw-   236,0 root  1 Apr 13:59  /dev/nvidia-uvm
crw-rw-rw-   236,1 root  1 Apr 13:59  /dev/nvidia-uvm-tools
crw-rw-rw-   195,0 root  1 Apr 13:59  /dev/nvidia0
crw-rw-rw- 195,255 root  1 Apr 13:59  /dev/nvidiactl

/dev/nvidia-caps:
cr-------- 239,1 root  1 Apr 14:29  nvidia-cap1
cr--r--r-- 239,2 root  1 Apr 14:29  nvidia-cap2
ReFleXzZ commented 1 year ago

These are my kernel parameters atm, because "nvidia_drm.modeset=1" is definitely set, don't know why mknod fails:

$ cat /proc/cmdline

File: /proc/cmdline
rw root=PARTLABEL=RebornOS nvme_load=yes add_efi_memmap nvidia_drm.modeset=1 NVreg_EnableResizableBar=1 NVreg_EnableGpuFirmware=1 acpi_osi=! "acpi_osi=Windows 2015" quiet splash loglevel=3 r
d.udev.log_priority=3 vt.global_cursor_default=0 initrd=intel-ucode.img initrd=initramfs-linux-clear.img

mkinitcpio.conf (modules):

$ sed -n 7p /etc/mkinicpio.conf

MODULES=(r8125 nvidia nvidia_modeset nvidia_uvm nvidia_drm)
elFarto commented 1 year ago

Just noticed you have the open source version of the driver installed, correct? If so, I'm not sure that works with CUDA.

ReFleXzZ commented 1 year ago

Yes I got the open version isntalled. I just tested something out though. I think it's a kernel-related issue. I'm using the clear-linux kernel, with which the vainfo throws the error. The session I booted up before was with the zen-linux kernel. With this one i get a positive output. Just don't know why the heck the kernel should be a problem, since I'm a kernel and driver-noob. 😅 I will just use the zen-kernel instead then, until it may or may not be working with the clear-linux kernel.

shelterx commented 8 months ago

Even tho' the issue is closed I'd like to add that "init_on_alloc=0" as kernel option causes this exact error.