NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.34k stars 14.3k forks source link

nvidia vaapi drivers accessing wrong device #324553

Open Ciflire opened 4 months ago

Ciflire commented 4 months ago

Describe the bug

The nvidia vaapi drivers seems to be using the wrong device when following conditions filled

Steps To Reproduce

Steps to reproduce the behavior:

  1. Install nvidia vaapi drivers
  2. Run nix-shell -p libva-utils --run 'NVD_LOG=1 NVD_MAX_INSTANCES=10 vainfo'
  3. Force run it on wanted device nix-shell -p libva-utils --run 'NVD_LOG=1 NVD_MAX_INSTANCES=10 NVD_BACKEND=direct vainfo --display drm --device /dev/dri/renderD128'
  4. Getting different outputs

Expected behavior

Default command should give the output

Trying display: drm
libva info: VA-API version 1.21.0
libva info: User environment variable requested driver 'nvidia'
libva info: Trying to open /run/opengl-driver/lib/dri/nvidia_drv_video.so
libva info: Found init function __vaDriverInit_1_0
       323.127074386 [21557-21557] ../src/vabackend.c:2188       __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver: 31
       323.127078434 [21557-21557] ../src/vabackend.c:2197       __vaDriverInit_1_0 Now have 0 (10 max) instances
       323.127080889 [21557-21557] ../src/vabackend.c:2223       __vaDriverInit_1_0 Selecting Direct backend
       323.134908887 [21557-21557] ../src/direct/nv-driver.c: 267            init_nvdriver Initing nvdriver...
       323.134935129 [21557-21557] ../src/direct/nv-driver.c: 285            init_nvdriver NVIDIA kernel driver version: 555.58, major version: 555, minor version: 58
       323.134939999 [21557-21557] ../src/direct/nv-driver.c: 292            init_nvdriver Got dev info: 100 1 2 6
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.21 (libva 2.22.0)
vainfo: Driver version: VA-API NVDEC driver [direct backend]
vainfo: Supported profile and entrypoints
      VAProfileMPEG2Simple            :    VAEntrypointVLD
      VAProfileMPEG2Main              :    VAEntrypointVLD
      VAProfileVC1Simple              :    VAEntrypointVLD
      VAProfileVC1Main                :    VAEntrypointVLD
      VAProfileVC1Advanced            :    VAEntrypointVLD
      VAProfileH264Main               :    VAEntrypointVLD
      VAProfileH264High               :    VAEntrypointVLD
      VAProfileH264ConstrainedBaseline:    VAEntrypointVLD
      VAProfileHEVCMain               :    VAEntrypointVLD
      VAProfileVP8Version0_3          :    VAEntrypointVLD
      VAProfileVP9Profile0            :    VAEntrypointVLD
      VAProfileAV1Profile0            :    VAEntrypointVLD
      VAProfileHEVCMain10             :    VAEntrypointVLD
      VAProfileHEVCMain12             :    VAEntrypointVLD
      VAProfileVP9Profile2            :    VAEntrypointVLD
      VAProfileHEVCMain444            :    VAEntrypointVLD
      VAProfileHEVCMain444_10         :    VAEntrypointVLD
      VAProfileHEVCMain444_12         :    VAEntrypointVLD
       323.408228975 [21557-21557] ../src/vabackend.c:2098              nvTerminate Terminating 0x10c0f8e0
       323.408987517 [21557-21557] ../src/vabackend.c:2112              nvTerminate Now have 0 (10 max) instances

different from

Trying display: wayland
libva info: VA-API version 1.21.0
libva info: User environment variable requested driver 'nvidia'
libva info: Trying to open /run/opengl-driver/lib/dri/nvidia_drv_video.so
libva info: Found init function __vaDriverInit_1_0
        26.988146755 [4790-4790] ../src/vabackend.c:2188       __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver: 40
        26.988156862 [4790-4790] ../src/vabackend.c:2197       __vaDriverInit_1_0 Now have 0 (10 max) instances
        26.988163853 [4790-4790] ../src/vabackend.c:2223       __vaDriverInit_1_0 Selecting Direct backend
        26.996992901 [4790-4790] ../src/backend-common.c:  31            isNvidiaDrmFd Invalid driver for DRM device: amdgpu
        26.997000376 [4790-4790] ../src/vabackend.c:2248       __vaDriverInit_1_0 Exporter failed
libva error: /run/opengl-driver/lib/dri/nvidia_drv_video.so init failed
libva info: va_openDriver() returns 1
vaInitialize failed with error code 1 (operation failed),exit

Additional context

This error results in web browser being unable to use webgl/webgpu i couldn't find any way to force them to run on dgpu so went back to using offload where default env var make firefox/librewolf work ref: https://github.com/elFarto/nvidia-vaapi-driver/issues/299 https://github.com/elFarto/nvidia-vaapi-driver/issues/213

Notify maintainers

@afh @NickCao @Kiskae

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.9.7, NixOS, 24.11 (Vicuna), 24.11.20240701.00d80d1`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.4`
 - channels(root): `"nixos, nixpkgs"`
 - channels(ciflire): `""`
 - nixpkgs: `/nix/store/j4jzjbr302cw5bl0n3pch5j9bh5qwmaj-source`

Add a :+1: reaction to issues you find important.

Kiskae commented 4 months ago

Try running nix-shell -p libva-utils --run 'NVD_LOG=1 NVD_MAX_INSTANCES=10 WAYLAND_DEBUG=1 vainfo'

Because I'm pretty sure the wrong DRM node is being provided by wayland and the nvidia driver can't really do anything about that if you force it to be loaded.

Essentially the error is being forced by libva info: User environment variable requested driver 'nvidia'

Ciflire commented 4 months ago

this is the output, so doesn't seem to be fixing the problem

~ 
    nix-shell -p libva-utils --run 'NVD_LOG=1 NVD_MAX_INSTANCES=10 WAYLAND_DEBUG=1 vainfo'
Trying display: wayland
[3962443.671]  -> wl_display@1.get_registry(new id wl_registry@2)
[3962443.820] wl_registry@2.global(1, "wl_seat", 9)
[3962443.829] wl_registry@2.global(2, "wl_data_device_manager", 3)
[3962443.833] wl_registry@2.global(3, "wl_compositor", 6)
[3962443.836]  -> wl_registry@2.bind(3, "wl_compositor", 1, new id [unknown]@3)
[3962443.838] wl_registry@2.global(4, "wl_subcompositor", 1)
[3962443.840] wl_registry@2.global(5, "wl_shm", 1)
[3962443.841] wl_registry@2.global(6, "wp_viewporter", 1)
[3962443.843] wl_registry@2.global(7, "wp_tearing_control_manager_v1", 1)
[3962443.845] wl_registry@2.global(8, "wp_fractional_scale_manager_v1", 1)
[3962443.846] wl_registry@2.global(9, "zxdg_output_manager_v1", 3)
[3962443.848] wl_registry@2.global(10, "wp_cursor_shape_manager_v1", 1)
[3962443.850] wl_registry@2.global(11, "zwp_idle_inhibit_manager_v1", 1)
[3962443.851] wl_registry@2.global(12, "zwp_relative_pointer_manager_v1", 1)
[3962443.853] wl_registry@2.global(13, "zxdg_decoration_manager_v1", 1)
[3962443.855] wl_registry@2.global(14, "wp_alpha_modifier_v1", 1)
[3962443.858] wl_registry@2.global(15, "zwlr_gamma_control_manager_v1", 1)
[3962443.860] wl_registry@2.global(16, "ext_foreign_toplevel_list_v1", 1)
[3962443.862] wl_registry@2.global(17, "zwp_pointer_gestures_v1", 3)
[3962443.863] wl_registry@2.global(18, "zwlr_foreign_toplevel_manager_v1", 3)
[3962443.868] wl_registry@2.global(19, "zwp_keyboard_shortcuts_inhibit_manager_v1", 1)
[3962443.873] wl_registry@2.global(20, "zwp_text_input_manager_v3", 1)
[3962443.875] wl_registry@2.global(21, "zwp_pointer_constraints_v1", 1)
[3962443.879] wl_registry@2.global(22, "zwlr_output_power_manager_v1", 1)
[3962443.881] wl_registry@2.global(23, "xdg_activation_v1", 1)
[3962443.883] wl_registry@2.global(24, "ext_idle_notifier_v1", 1)
[3962443.885] wl_registry@2.global(25, "ext_session_lock_manager_v1", 1)
[3962443.888] wl_registry@2.global(26, "zwp_input_method_manager_v2", 1)
[3962443.890] wl_registry@2.global(27, "zwp_virtual_keyboard_manager_v1", 1)
[3962443.892] wl_registry@2.global(28, "zwlr_virtual_pointer_manager_v1", 2)
[3962443.894] wl_registry@2.global(29, "zwlr_output_manager_v1", 4)
[3962443.897] wl_registry@2.global(30, "org_kde_kwin_server_decoration_manager", 1)
[3962443.899] wl_registry@2.global(31, "hyprland_focus_grab_manager_v1", 1)
[3962443.903] wl_registry@2.global(32, "zwp_tablet_manager_v2", 1)
[3962443.906] wl_registry@2.global(33, "zwlr_layer_shell_v1", 5)
[3962443.908] wl_registry@2.global(34, "wp_presentation", 1)
[3962443.911] wl_registry@2.global(35, "xdg_wm_base", 6)
[3962443.914] wl_registry@2.global(36, "zwlr_data_control_manager_v1", 2)
[3962443.916] wl_registry@2.global(37, "zwp_primary_selection_device_manager_v1", 1)
[3962443.918] wl_registry@2.global(38, "xwayland_shell_v1", 1)
[3962443.921] wl_registry@2.global(39, "wl_drm", 2)
[3962443.922] wl_registry@2.global(40, "zwp_linux_dmabuf_v1", 5)
[3962443.924] wl_registry@2.global(41, "hyprland_toplevel_export_manager_v1", 2)
[3962443.926] wl_registry@2.global(42, "zwp_text_input_manager_v1", 1)
[3962443.928] wl_registry@2.global(43, "hyprland_global_shortcuts_manager_v1", 1)
[3962443.931] wl_registry@2.global(44, "zwlr_screencopy_manager_v1", 3)
[3962443.934] wl_registry@2.global(45, "wp_drm_lease_device_v1", 1)
[3962443.936] wl_registry@2.global(46, "wp_drm_lease_device_v1", 1)
[3962443.938] wl_registry@2.global(47, "wl_output", 4)
[3962443.942] wl_registry@2.global(48, "wl_output", 4)
[3962443.945]  -> wl_display@1.get_registry(new id wl_registry@4)
[3962443.949]  -> wl_display@1.sync(new id wl_callback@5)
[3962444.078] wl_display@1.delete_id(5)
[3962444.090] wl_registry@4.global(1, "wl_seat", 9)
[3962444.092] wl_registry@4.global(2, "wl_data_device_manager", 3)
[3962444.094] wl_registry@4.global(3, "wl_compositor", 6)
[3962444.096] wl_registry@4.global(4, "wl_subcompositor", 1)
[3962444.098] wl_registry@4.global(5, "wl_shm", 1)
[3962444.100] wl_registry@4.global(6, "wp_viewporter", 1)
[3962444.101] wl_registry@4.global(7, "wp_tearing_control_manager_v1", 1)
[3962444.103] wl_registry@4.global(8, "wp_fractional_scale_manager_v1", 1)
[3962444.105] wl_registry@4.global(9, "zxdg_output_manager_v1", 3)
[3962444.106] wl_registry@4.global(10, "wp_cursor_shape_manager_v1", 1)
[3962444.108] wl_registry@4.global(11, "zwp_idle_inhibit_manager_v1", 1)
[3962444.110] wl_registry@4.global(12, "zwp_relative_pointer_manager_v1", 1)
[3962444.112] wl_registry@4.global(13, "zxdg_decoration_manager_v1", 1)
[3962444.113] wl_registry@4.global(14, "wp_alpha_modifier_v1", 1)
[3962444.115] wl_registry@4.global(15, "zwlr_gamma_control_manager_v1", 1)
[3962444.117] wl_registry@4.global(16, "ext_foreign_toplevel_list_v1", 1)
[3962444.119] wl_registry@4.global(17, "zwp_pointer_gestures_v1", 3)
[3962444.121] wl_registry@4.global(18, "zwlr_foreign_toplevel_manager_v1", 3)
[3962444.126] wl_registry@4.global(19, "zwp_keyboard_shortcuts_inhibit_manager_v1", 1)
[3962444.128] wl_registry@4.global(20, "zwp_text_input_manager_v3", 1)
[3962444.131] wl_registry@4.global(21, "zwp_pointer_constraints_v1", 1)
[3962444.133] wl_registry@4.global(22, "zwlr_output_power_manager_v1", 1)
[3962444.136] wl_registry@4.global(23, "xdg_activation_v1", 1)
[3962444.138] wl_registry@4.global(24, "ext_idle_notifier_v1", 1)
[3962444.140] wl_registry@4.global(25, "ext_session_lock_manager_v1", 1)
[3962444.142] wl_registry@4.global(26, "zwp_input_method_manager_v2", 1)
[3962444.144] wl_registry@4.global(27, "zwp_virtual_keyboard_manager_v1", 1)
[3962444.146] wl_registry@4.global(28, "zwlr_virtual_pointer_manager_v1", 2)
[3962444.148] wl_registry@4.global(29, "zwlr_output_manager_v1", 4)
[3962444.150] wl_registry@4.global(30, "org_kde_kwin_server_decoration_manager", 1)
[3962444.152] wl_registry@4.global(31, "hyprland_focus_grab_manager_v1", 1)
[3962444.154] wl_registry@4.global(32, "zwp_tablet_manager_v2", 1)
[3962444.156] wl_registry@4.global(33, "zwlr_layer_shell_v1", 5)
[3962444.158] wl_registry@4.global(34, "wp_presentation", 1)
[3962444.160] wl_registry@4.global(35, "xdg_wm_base", 6)
[3962444.163] wl_registry@4.global(36, "zwlr_data_control_manager_v1", 2)
[3962444.165] wl_registry@4.global(37, "zwp_primary_selection_device_manager_v1", 1)
[3962444.168] wl_registry@4.global(38, "xwayland_shell_v1", 1)
[3962444.170] wl_registry@4.global(39, "wl_drm", 2)
[3962444.173]  -> wl_registry@4.bind(39, "wl_drm", 2, new id [unknown]@6)
[3962444.175] wl_registry@4.global(40, "zwp_linux_dmabuf_v1", 5)
[3962444.178] wl_registry@4.global(41, "hyprland_toplevel_export_manager_v1", 2)
[3962444.182] wl_registry@4.global(42, "zwp_text_input_manager_v1", 1)
[3962444.185] wl_registry@4.global(43, "hyprland_global_shortcuts_manager_v1", 1)
[3962444.187] wl_registry@4.global(44, "zwlr_screencopy_manager_v1", 3)
[3962444.189] wl_registry@4.global(45, "wp_drm_lease_device_v1", 1)
[3962444.191] wl_registry@4.global(46, "wp_drm_lease_device_v1", 1)
[3962444.193] wl_registry@4.global(47, "wl_output", 4)
[3962444.196] wl_registry@4.global(48, "wl_output", 4)
[3962444.199] wl_callback@5.done(393)
[3962444.202]  -> wl_display@1.sync(new id wl_callback@5)
[3962444.323] wl_display@1.delete_id(5)
[3962444.339] wl_drm@6.device("/dev/dri/renderD129")
[3962444.434] wl_drm@6.capabilities(1)
[3962444.437] wl_drm@6.format(1211384385)
[3962444.438] wl_drm@6.format(1211384408)
[3962444.440] wl_drm@6.format(942948929)
[3962444.442] wl_drm@6.format(942948952)
[3962444.443] wl_drm@6.format(808669761)
[3962444.445] wl_drm@6.format(808669784)
[3962444.448] wl_drm@6.format(808665665)
[3962444.451] wl_drm@6.format(808665688)
[3962444.455] wl_drm@6.format(875713089)
[3962444.457] wl_drm@6.format(875708993)
[3962444.460] wl_drm@6.format(875713112)
[3962444.463] wl_drm@6.format(875709016)
[3962444.466] wl_drm@6.format(892424769)
[3962444.469] wl_drm@6.format(892420673)
[3962444.472] wl_drm@6.format(842093121)
[3962444.475] wl_drm@6.format(842089025)
[3962444.477] wl_drm@6.format(909199186)
[3962444.480] wl_drm@6.format(538982482)
[3962444.483] wl_drm@6.format(540422482)
[3962444.485] wl_drm@6.format(943215175)
[3962444.489] wl_drm@6.format(842224199)
[3962444.492] wl_drm@6.format(961959257)
[3962444.493] wl_drm@6.format(825316697)
[3962444.496] wl_drm@6.format(842093913)
[3962444.497] wl_drm@6.format(909202777)
[3962444.499] wl_drm@6.format(875713881)
[3962444.501] wl_drm@6.format(961893977)
[3962444.503] wl_drm@6.format(825316953)
[3962444.506] wl_drm@6.format(842094169)
[3962444.508] wl_drm@6.format(909203033)
[3962444.510] wl_drm@6.format(875714137)
[3962444.511] wl_drm@6.format(842094158)
[3962444.513] wl_drm@6.format(825382478)
[3962444.516] wl_drm@6.format(808530000)
[3962444.518] wl_drm@6.format(842084432)
[3962444.521] wl_drm@6.format(909193296)
[3962444.522] wl_drm@6.format(808661072)
[3962444.524] wl_drm@6.format(909203022)
[3962444.529] wl_drm@6.format(1448433985)
[3962444.533] wl_drm@6.format(1448434008)
[3962444.535] wl_drm@6.format(808531033)
[3962444.537] wl_drm@6.format(842085465)
[3962444.540] wl_drm@6.format(909194329)
[3962444.543] wl_drm@6.format(1448695129)
[3962444.546] wl_drm@6.format(1431918169)
[3962444.548] wl_drm@6.format(1498831189)
[3962444.551] wl_drm@6.format(1498765654)
[3962444.553] wl_drm@6.format(808530521)
[3962444.555] wl_drm@6.format(842084953)
[3962444.558] wl_drm@6.format(909193817)
[3962444.560] wl_callback@5.done(393)
[3962444.563]  -> wl_display@1.sync(new id wl_callback@5)
[3962444.654] wl_display@1.delete_id(5)
[3962444.666] wl_callback@5.done(393)
libva info: VA-API version 1.21.0
libva info: User environment variable requested driver 'nvidia'
libva info: Trying to open /run/opengl-driver/lib/dri/nvidia_drv_video.so
libva info: Found init function __vaDriverInit_1_0
       163.627914209 [6037-6037] ../src/vabackend.c:2188       __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver: 40
       163.627924378 [6037-6037] ../src/vabackend.c:2197       __vaDriverInit_1_0 Now have 0 (10 max) instances
       163.627931331 [6037-6037] ../src/vabackend.c:2223       __vaDriverInit_1_0 Selecting Direct backend
       163.636557983 [6037-6037] ../src/backend-common.c:  31            isNvidiaDrmFd Invalid driver for DRM device: amdgpu
       163.636564275 [6037-6037] ../src/vabackend.c:2248       __vaDriverInit_1_0 Exporter failed
libva error: /run/opengl-driver/lib/dri/nvidia_drv_video.so init failed
libva info: va_openDriver() returns 1
vaInitialize failed with error code 1 (operation failed),exit
Kiskae commented 4 months ago

[3962444.339] wl_drm@6.device("/dev/dri/renderD129")

Well that is the issue, wayland is providing the amdgpu drm node and you forced the nvidia libva driver to be used. It makes sense that that doesn't work unless you specifically override wayland and tell it to use the nvidia drm node.

It also makes sense this happens in a PRIME setup, since wayland is probably rendering to the iGPU, so that is the drm node it uses.

Ciflire commented 4 months ago

That is the issue but is it to be fixed by nixpkgs/nvidia vaapi/hyprland?

Kiskae commented 4 months ago

That is the issue but is it to be fixed by nixpkgs/nvidia vaapi/hyprland?

I'm more curious what the intended behavior would be, because if you're displaying from your iGPU then why would decoding video to the memory of your dGPU be the intended behavior? You'd just end up having to copy the data through the system bus anyway.

I'd expect you need to remove the LIBVA_DRIVER_NAME and just let libva pick the correct driver, which would be the one for your amdgpu in this case.

Ciflire commented 4 months ago

For me the real issue is the browser that cannot use webgpu in a sync setup, which iirc is supposed to be automatic

Kiskae commented 4 months ago

For me the real issue is the browser that cannot use webgpu in a sync setup, which iirc is supposed to be automatic

At this point I'd need an example of a system where this is working as you're describing it so we can inspect how it is configured. You've shown that libva works if it is called with the nvidia drm node, so the reason it isn't being selected is a runtime issue.