NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.39k stars 14.34k forks source link

[NVIDIA] Missing hardware acceleration libraries (Wayland)? #224332

Open GrabbenD opened 1 year ago

GrabbenD commented 1 year ago

It seems like some system paths are missing when using NVIDIA on Wayland if you follow the NixOS NVIDIA guide? I've tried to add nvidia-vaapi-driver and egl-wayland without luck:

{ config, pkgs, lib, ... }: {
  services.xserver.videoDrivers = [ "nvidia" ]; # Wayland
  hardware.nvidia = {
    package = config.boot.kernelPackages.nvidiaPackages.beta;

    # Open drivers (NVreg_OpenRmEnableUnsupportedGpus=1)
    open = true;

    # nvidia-drm.modeset=1
    modesetting.enable = true;

    # Allow headless mode
    nvidiaPersistenced = true;

    # NVreg_PreserveVideoMemoryAllocations=1
    powerManagement.enable = true;
  };

  # Hardware acceleration
  hardware.opengl = {
    enable = true;

    # Vulkan
    driSupport = true;

    # VA-API
    extraPackages = with pkgs; [
      vaapiVdpau
      libvdpau-va-gl

      # Test
      nvidia-vaapi-driver
    ];
  };

  # Test
  environment.systemPackages = with pkgs; [
    nvidia-vaapi-driver
    egl-wayland
  ];
}

Here's 2 examples:

$ brave --enable-features=UseOzonePlatform --ozone-platform=wayland
MESA-LOADER: failed to retrieve device information
MESA-LOADER: failed to open nvidia-drm: /run/opengl-driver/lib/dri/nvidia-drm_dri.so: cannot open shared object file: Permission denied (search paths /run/opengl-driver/lib/dri, suffix _dri)
failed to load driver: nvidia-drm
MESA-LOADER: failed to open zink: /run/opengl-driver/lib/dri/zink_dri.so: cannot open shared object file: Permission denied (search paths /run/opengl-driver/lib/dri, suffix _dri)
failed to load driver: zink
MESA-LOADER: failed to open kms_swrast: /run/opengl-driver/lib/dri/kms_swrast_dri.so: cannot open shared object file: Permission denied (search paths /run/opengl-driver/lib/dri, suffix _dri)
failed to load driver: kms_swrast
MESA-LOADER: failed to open swrast: /run/opengl-driver/lib/dri/swrast_dri.so: cannot open shared object file: Permission denied (search paths /run/opengl-driver/lib/dri, suffix _dri)
failed to load swrast driver

nvidia-drm_dri.so is nowhere in the system even though the module is loaded:

$ lsmod | grep nvidia
nvidia_uvm           1630208  0
nvidia_drm             94208  18
nvidia_modeset       1761280  5 nvidia_drm
video                  73728  2 asus_wmi,nvidia_modeset
nvidia               7393280  310 nvidia_uvm,nvidia_modeset
drm_kms_helper        233472  1 nvidia_drm
drm                   675840  19 drm_kms_helper,nvidia,nvidia_drm
i2c_core              131072  5 drm_kms_helper,nvidia,i2c_piix4,i2c_dev,drm
backlight              28672  4 video,asus_wmi,drm,nvidia_modeset

Multiple OpenGL libraries are owned by root:root?

$ l /run/opengl-driver/lib/dri/nvidia-drm_dri.so
ls: cannot access '/run/opengl-driver/lib/dri/nvidia-drm_dri.so': No such file or directory

$ l /run/opengl-driver/lib/dri/zink_dri.so
lrwxrwxrwx 1 root root 83 Jan  1  1970 /run/opengl-driver/lib/dri/zink_dri.so -> /nix/store/lfglc4z10mv986njfb56xcgadcp2r997-mesa-22.3.7-drivers/lib/dri/zink_dri.so

$ l /run/opengl-driver/lib/dri/kms_swrast_dri.so
lrwxrwxrwx 1 root root 89 Jan  1  1970 /run/opengl-driver/lib/dri/kms_swrast_dri.so -> /nix/store/lfglc4z10mv986njfb56xcgadcp2r997-mesa-22.3.7-drivers/lib/dri/kms_swrast_dri.so

$ /run/opengl-driver/lib/dri/swrast_dri.so
lrwxrwxrwx 1 root root 85 Jan  1  1970 /run/opengl-driver/lib/dri/swrast_dri.so -> /nix/store/lfglc4z10mv986njfb56xcgadcp2r997-mesa-22.3.7-drivers/lib/dri/swrast_dri.so

From my understanding this causes hardware acceleration to be disabled in Chromium based browsers like Brave:

Graphics Feature Status
- Canvas: Hardware accelerated
- Canvas out-of-process rasterization: Disabled
- Direct Rendering Display Compositor: Disabled
- Compositing: Software only. Hardware acceleration disabled
- Multiple Raster Threads: Enabled
- OpenGL: Enabled
- Rasterization: Hardware accelerated
- Raw Draw: Disabled
- Video Decode: Hardware accelerated
- Video Encode: Software only. Hardware acceleration disabled
- Vulkan: Disabled
- WebGL: Hardware accelerated but at reduced performance
- WebGL2: Hardware accelerated but at reduced performance
- WebGPU: Disabled

I wonder if this is the reason for VirtIO failing to bring up Spice Display with OpenGL 3D Acceleration?

libvirt.libvirtError: internal error: process exited while connecting to monitor: 2023-04-02T07:34:12.090390Z qemu-system-x86_64: egl: eglInitialize failed
2023-04-02T07:34:12.090441Z qemu-system-x86_64: Failed to initialize EGL render node for SPICE GL

Is something missing in the packaging step for NVIDIA drivers?

Kiskae commented 1 year ago

MESA-LOADER: failed to open nvidia-drm: /run/opengl-driver/lib/dri/nvidia-drm_dri.so: cannot open shared object file: Permission denied (search paths /run/opengl-driver/lib/dri, suffix _dri)

This is a a part of nvidia-vaapi-driver, the nvidia driver does not have an official libva driver.

Specifically it appears to find the library but fail due to a Permission denied error. Whether that deals with loading the library or something during initialization of that library I cannot tell from these logs.

GrabbenD commented 1 year ago

MESA-LOADER: failed to open nvidia-drm: /run/opengl-driver/lib/dri/nvidia-drm_dri.so: cannot open shared object file: Permission denied (search paths /run/opengl-driver/lib/dri, suffix _dri)

This is a a part of nvidia-vaapi-driver, the nvidia driver does not have an official libva driver.

Is there any workaround like using a different driver other than GBM_BACKEND = "nvidia-drm"?

Specifically it appears to find the library but fail due to a Permission denied error. Whether that deals with loading the library or something during initialization of that library I cannot tell from these logs.

Thanks for pointing it out, I updated OP.

Shouldn't these be owned by video group instead of root:root?

$ l /run/opengl-driver/lib/dri/nvidia-drm_dri.so
ls: cannot access '/run/opengl-driver/lib/dri/nvidia-drm_dri.so': No such file or directory

$ l /run/opengl-driver/lib/dri/zink_dri.so
lrwxrwxrwx 1 root root 83 Jan  1  1970 /run/opengl-driver/lib/dri/zink_dri.so -> /nix/store/lfglc4z10mv986njfb56xcgadcp2r997-mesa-22.3.7-drivers/lib/dri/zink_dri.so

$ l /run/opengl-driver/lib/dri/kms_swrast_dri.so
lrwxrwxrwx 1 root root 89 Jan  1  1970 /run/opengl-driver/lib/dri/kms_swrast_dri.so -> /nix/store/lfglc4z10mv986njfb56xcgadcp2r997-mesa-22.3.7-drivers/lib/dri/kms_swrast_dri.so

$ /run/opengl-driver/lib/dri/swrast_dri.so
lrwxrwxrwx 1 root root 85 Jan  1  1970 /run/opengl-driver/lib/dri/swrast_dri.so -> /nix/store/lfglc4z10mv986njfb56xcgadcp2r997-mesa-22.3.7-drivers/lib/dri/swrast_dri.so
Kiskae commented 1 year ago

Is there any workaround like using a different driver other than GBM_BACKEND = "nvidia-drm"?

it looks like chromium-based browsers are not supported by nvidia-vaapi-driver: https://github.com/elFarto/nvidia-vaapi-driver/issues/5 - as far as I can see

Shouldn't these be owned by video group instead of root:root?

They are library files that are stored in /nix/store and are world-readable, so the owner/group does not matter. The actual dri devices are in /dev/dri and those should already be configured correctly through the default udev rules.

PedroHLC commented 1 year ago

This used to work:

╰─λ nvidia-offload vainfo
Trying display: wayland
libva info: VA-API version 1.17.0
libva info: User environment variable requested driver 'nvidia'
libva info: Trying to open /run/opengl-driver/lib/dri/nvidia_drv_video.so
libva info: Found init function __vaDriverInit_1_0
libva error: /run/opengl-driver/lib/dri/nvidia_drv_video.so init failed
libva info: va_openDriver() returns 1
vaInitialize failed with error code 1 (operation failed),exit

╰─λ file (readlink /run/opengl-driver/lib/dri/nvidia_drv_video.so)
/nix/store/4s8pxgva2xfg618gs1jilqm3ksisjc8f-nvidia-vaapi-driver-0.0.9/lib/dri/nvidia_drv_video.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, not stripped

NOTES:

  1. I had glxgears running on this GPU in another window, so the GPU was on;
  2. I have LIBVA_DRIVER_NAME="nvidia" in the nvidia-offload as I always had.

I'll try to bisect it later.

Kiskae commented 1 year ago

With NVD_LOG=1 you get more information about the failure:

libva info: Trying to open /run/opengl-driver/lib/dri/nvidia_drv_video.so
libva info: Found init function __vaDriverInit_1_0
    564867.348165152 [3955711-3955711] ../src/vabackend.c:2165       __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver: 40
    564867.348167860 [3955711-3955711] ../src/vabackend.c:2174       __vaDriverInit_1_0 Now have 0 (0 max) instances
    564867.348169201 [3955711-3955711] ../src/vabackend.c:2197       __vaDriverInit_1_0 Selecting EGL backend
    564867.350982381 [3955711-3955711] ../src/export-buf.c: 147       findGPUIndexFromFd Looking for DRM device index: 1
    564867.351886278 [3955711-3955711] ../src/export-buf.c: 161       findGPUIndexFromFd Found 4 EGL devices
    564867.351922610 [3955711-3955711] ../src/export-buf.c: 170       findGPUIndexFromFd Got EGL_CUDA_DEVICE_NV value '0' for EGLDevice 0
    564867.351924900 [3955711-3955711] ../src/export-buf.c: 176       findGPUIndexFromFd Found drmDeviceIndex: 1
    564867.351926169 [3955711-3955711] ../src/export-buf.c: 208       findGPUIndexFromFd Selecting EGLDevice 0
    564867.352503709 [3955711-3955711] ../src/export-buf.c: 277         egl_initExporter Driver supports 16-bit surfaces
libva info: va_openDriver() returns 0
PedroHLC commented 1 year ago

With NVD_LOG=1 you get more information about the failure

Gotcha:

╰─λ nvidia-offload vainfo --display drm --device /dev/dri/renderD128
Trying display: drm
      2439.282152225 [92926-92926] ../src/vabackend.c:2165       __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver: 31
      2439.282164831 [92926-92926] ../src/vabackend.c:2174       __vaDriverInit_1_0 Now have 0 (0 max) instances
      2439.282167922 [92926-92926] ../src/vabackend.c:2200       __vaDriverInit_1_0 Selecting Direct backend
      2439.288609798 [92926-92926] ../src/direct/nv-driver.c: 217            init_nvdriver Initing nvdriver...
      2439.288622659 [92926-92926] ../src/direct/nv-driver.c: 222            init_nvdriver Got dev info: 100 1 2 6
      2439.292780746 [92926-92926] ../src/direct/nv-driver.c: 283            init_nvdriver NVIDIA kernel driver version: 530.41.03
vainfo: VA-API version: 1.17 (libva 2.17.1)
vainfo: Driver version: VA-API NVDEC driver [direct backend]
vainfo: Supported profile and entrypoints
      VAProfileMPEG2Simple            : VAEntrypointVLD
      VAProfileMPEG2Main              : VAEntrypointVLD
      VAProfileVC1Simple              : VAEntrypointVLD
      VAProfileVC1Main                : VAEntrypointVLD
      VAProfileVC1Advanced            : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointVLD
      VAProfileVP8Version0_3          : VAEntrypointVLD
      VAProfileVP9Profile0            : VAEntrypointVLD
      VAProfileAV1Profile0            : VAEntrypointVLD
      VAProfileHEVCMain10             : VAEntrypointVLD
      VAProfileHEVCMain12             : VAEntrypointVLD
      VAProfileVP9Profile2            : VAEntrypointVLD
      VAProfileHEVCMain444            : VAEntrypointVLD
      2439.557022732 [92926-92926] ../src/vabackend.c:2075              nvTerminate Terminating 0x146f8e0
      2439.557787202 [92926-92926] ../src/vabackend.c:2089              nvTerminate Now have 0 (0 max) instances

And EGL too: __vaDriverInit_1_0 Selecting EGL backend

I wonder why it changed that I now have to expect the device. But I can confirm that this is not breaking MPV's vaapi-copy. And it leaves something different for @GrabbenD to test.

Also, @GrabbenD you won't want: [ vaapiVdpau libvdpau-va-gl nvidia-vaapi-driver ], only the last one is enough nowadays (as long as your GPU supports nvenc/nvdec). And this last one is automatically enabled if you have services.xserver.videoDrivers = [ "nvidia" ].

For MPV testing: NVD_BACKEND=direct LIBVA_DRIVER_NAME='nvidia' mpv --vo=gpu --gpu-api=opengl --gpu-context=wayland --hwdec=vaapi-copy ~/Videos/HeyYa-Astrophysics.mp4 (not sure how to test with EGL, waylandvk seems to fail)

GrabbenD commented 1 year ago

@PedroHLC Nice findings! The bisect could be very helpful if nvidia_drv_video.so worked before.

I don't know if this is related to the MESA errors in Chromium from above but reverting to Nvidia 470 driver actually works for a lot of people who want to use hardware accelerated OpenGL with VirtIO display driver in QEMU for a Windows VM. Although I'm suck with 5xx drivers since I have a newer GPU (3080 Ti)

Kiskae commented 1 year ago

hardware accelerated OpenGL with VirtIO display driver in QEMU for a Windows VM.

While looking into creating a minimal nixos qemu setup to test this, I ran into the following section on the arch wiki:

For Windows guests, there is very little information on VirtIO-gpu OpenGL drivers but there is a report that Red Hat abandoned work on it

Verifying that nvidia_drv_video.so is working is actually quite easy since it only exposes the relatively small libva API. So if vainfo succeeds without errors then that driver is working as intended.

NULLx76 commented 1 year ago

I'm having similar errors whenever I try to run electron apps in wayland (Gnome) on Nvidia. Things like glxgears do work fine though.

I get errors like:

victor@eevee ~ % element-desktop 
/home/victor/.config/Element exists: yes
/home/victor/.config/Riot exists: no
No update_base_url is defined: auto update is disabled
Fetching translation json for locale: en_EN
Changing application language to en
Fetching translation json for locale: en
Resetting the UI components after locale change
Resetting the UI components after locale change
MESA-LOADER: failed to retrieve device information
MESA-LOADER: failed to open nvidia-drm: /run/opengl-driver/lib/dri/nvidia-drm_dri.so: cannot open shared object file: Permission denied (search paths /run/opengl-driver/lib/dri, suffix _dri)
MESA-LOADER: failed to open zink: /run/opengl-driver/lib/dri/zink_dri.so: cannot open shared object file: Permission denied (search paths /run/opengl-driver/lib/dri, suffix _dri)
MESA-LOADER: failed to open kms_swrast: /run/opengl-driver/lib/dri/kms_swrast_dri.so: cannot open shared object file: Permission denied (search paths /run/opengl-driver/lib/dri, suffix _dri)
MESA-LOADER: failed to open swrast: /run/opengl-driver/lib/dri/swrast_dri.so: cannot open shared object file: Permission denied (search paths /run/opengl-driver/lib/dri, suffix _dri)
Changing application language to en
Fetching translation json for locale: en
Resetting the UI components after locale change

And then the app just has a black screen.

However in X11 everything is fine.

Kiskae commented 1 year ago

As far as I know the drivers in /run/opengl-driver/lib/dri/ are only used by libva for video acceleration. If those drivers can't be loaded it should fall back on software decoding without affecting the program itself and unless it is trying to play a video it shouldn't result in a black screen.

@SuperSandro2000 - you're listed as maintainer on libva, have you seen these errors before?

SuperSandro2000 commented 1 year ago

@SuperSandro2000 - you're listed as maintainer on libva, have you seen these errors before?

I only use it on intel integrated graphics.

NULLx76 commented 1 year ago

I ended up fixing my problem by setting environment.sessionVariables.NIXOS_OZONE_WL = "1";, it does still print the errors so they just seemed unrelated to the black screen.

dan4ik605743 commented 1 year ago

MESA-LOADER: failed to open nvidia-drm: /nix/store/ja4ax8704yaap28a9v0xqbpx2ag8sckc-mesa-23.1.7/lib/gbm/nvidia-drm_gbm.so: cannot open shared object file: Permission denied (search paths /run/opengl-driver/lib/gbm:/nix/store/ja4ax8704yaap28a9v0xqbpx2ag8sckc-mesa-23.1.7/lib/gbm, suffix _gbm) MESA-LOADER: failed to retrieve device information MESA-LOADER: failed to open nvidia-drm: /run/opengl-driver/lib/dri/nvidia-drm_dri.so: cannot open shared object file: Permission denied (search paths /run/opengl-driver/lib/dri, suffix _dri) MESA-LOADER: failed to open zink: /run/opengl-driver/lib/dri/zink_dri.so: cannot open shared object file: Permission denied (search paths /run/opengl-driver/lib/dri, suffix _dri) MESA-LOADER: failed to open kms_swrast: /run/opengl-driver/lib/dri/kms_swrast_dri.so: cannot open shared object file: Permission denied (search paths /run/opengl-driver/lib/dri, suffix _dri) MESA-LOADER: failed to open swrast: /run/opengl-driver/lib/dri/swrast_dri.so: cannot open shared object file: Permission denied (search paths /run/opengl-driver/lib/dri, suffix _dri)

the same errors, when starting the notesnook application, NIXOS_OZONE_WL=1 is enabled but shows just a white screen, without OZONE a black screen. what to do?

dan4ik605743 commented 1 year ago

with X11 session, worked. wayland, no

dan4ik605743 commented 1 year ago

with --disable-gpu worked in wayland

GrabbenD commented 1 year ago

with X11 session, worked. wayland, no

In short, NVIDIA is refusing to implement proper support for the new open source GBM protocol which the entire Wayland ecosystem is based on, they want developers to use their proprietary EGL protocol.

Sure you might get some things to work on NVIDIA with various workarounds but it's just not worth wasting time on it.

I switched to AMD and things are working a lot better if you want to use Wayland. AMD's 6xxx and 7xxx GPUs work the best.

Vote with your wallet or stick to X11 with NVIDIA for a couple more years.

Kiskae commented 1 year ago

MESA-LOADER: failed to retrieve device information

Pretty sure this happens while loading the NVIDIA GBM backend, but I wouldn't be able to say what the specific error is.

nrdxp commented 1 year ago

In short, NVIDIA is refusing to implement proper support for the new open source GBM protocol which the entire Wayland ecosystem is based on, they want developers to use their proprietary EGL protocol.

This was the case years ago, but nvidia effective abandoned their eglstreams backend a few years back and started supporting GBM, which has been the case for over a year now.