NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.95k stars 13.97k forks source link

libva segmentation faults on wayland #299818

Closed hughobrien closed 6 months ago

hughobrien commented 6 months ago

Describe the bug

Segfault with recent libva. Affects mpv (with gpu target) but can be reproduced with vainfo

Steps To Reproduce

Steps to reproduce the behavior:

nix-shell -p libva-utils --run vainfo
Trying display: wayland
libva info: VA-API version 1.20.0
/tmp/nix-shell-51278-0/rc: line 3: 53303 Segmentation fault      (core dumped) vainfo

Screenshots

Mar 28 15:46:29 fw systemd-coredump[53305]: [🡕] Process 53303 (vainfo) of user 1001 dumped core.

                                            Module libXdmcp.so.6 without build-id.
                                            Module libXau.so.6 without build-id.
                                            Module libffi.so.8 without build-id.
                                            Module libxcb-dri3.so.0 without build-id.
                                            Module libX11-xcb.so.1 without build-id.
                                            Module libXfixes.so.3 without build-id.
                                            Module libXext.so.6 without build-id.
                                            Module libxcb.so.1 without build-id.
                                            Module libva-wayland.so.2 without build-id.
                                            Module libdrm.so.2 without build-id.
                                            Module libva-drm.so.2 without build-id.
                                            Module libva-x11.so.2 without build-id.
                                            Module libX11.so.6 without build-id.
                                            Module libva.so.2 without build-id.
                                            Module vainfo without build-id.
                                            Stack trace of thread 53303:
                                            #0  0x00007f8b47f24c7c VA_DRM_GetDriverNames (libva-wayland.so.2 + 0x2c7c)
                                            #1  0x00007f8b480a9fda vaInitialize (libva.so.2 + 0x3fda)
                                            #2  0x0000000000402488 main (vainfo + 0x2488)
                                            #3  0x00007f8b47d610ce __libc_start_call_main (libc.so.6 + 0x280ce)
                                            #4  0x00007f8b47d61189 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x28189)
                                            #5  0x00000000004028f5 _start (vainfo + 0x28f5)
                                            ELF object binary architecture: AMD x86-64
Mar 28 15:46:29 fw systemd[1]: systemd-coredump@5-53304-0.service: Deactivated successfully.

Additional context

Mar 28 15:46:29 fw kernel: vainfo[53303]: segfault at 0 ip 00007f8b47f24c7c sp 00007fffc76b66a0 error 4 in libva-wayland.so.2.2000.0[7f8b47f24000+1000] likely on CPU 8 (core 16, socket 0)
Mar 28 15:46:29 fw kernel: Code: 41 56 41 55 49 89 f5 41 54 49 89 d4 55 53 48 81 ec 98 01 00 00 64 48 8b 04 25 28 00 00 00 48 89 84 24 88 01 00 00 48 8b 47 68 <8b> 38 e8 ed f4 ff ff 48 85 c0 0f 84 04 01 00 00 48 8b 78 10 48 89
13th Gen Intel(R) Core(TM) i5-1340P

(Framework 13" Intel)

Notify maintainers

@SuperSandro2000

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

nix-shell -p nix-info --run "nix-info -m" 
 - system: `"x86_64-linux"`
 - host os: `Linux 6.8.1, NixOS, 24.05 (Uakari), 24.05.20240327.2726f12`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.2`
 - channels(root): `"nixos"`
 - nixpkgs: `/nix/store/y45vqv6pa8bhgag1dw86rvi6rk55xhxn-source

Add a :+1: reaction to issues you find important.

SuperSandro2000 commented 6 months ago

Can't reproduce that. What do you have configured in hardware.opengl.extraPackages?

hughobrien commented 6 months ago
    opengl = {
      enable = true;
      extraPackages = with pkgs; [ libvdpau-va-gl vaapiVdpau intel-media-driver ];
    };

The issue appears related to the presence of a DisplayLink (not Port) device.

~ master*
❯ ls /dev/dri
by-path  card0  card1  renderD128

~ master*
❯ echo $WLR_DRM_DEVICES
/dev/dri/card0:/dev/dri/card1

When the device is not present, or not present at sway startup then vainfo works correctly. This did work correctly until a recent update (perhaps 3/4 weeks since I updated.)

I will try and rollback a few packages to see if I can find the trigger, but would appreciate any suggestions.

hughobrien commented 6 months ago

Digging into this some more I traced the error back and found that there was a nullcheck removed from libva-utils, since re-added.

When I add it back I trigger it

[nix-shell:/data/nixpkgs/pkgs/development/libraries/libva/source/build]$ LD_LIBRARY_PATH=$PWD/va vainfo
Trying display: wayland
libva info: VA-API version 1.20.0
libva error: vaGetDriverNames() failed with invalid VADisplay
vaInitialize failed with error code 3 (invalid VADisplay),exit

I think the root cause for this is the evdi device showing up in /dev/dri, and for some reason taking the card0 slot where it used to be card1. There seems to be more than a few complaints where multi-gpu system owners find software assuming card0 has rendering abilities.

hughobrien commented 6 months ago

Just closing out for any one in the future, I think the root cause was the (allowed) renumbering of the /dev/dri devices.

I had been using the patch that allows renderless evdi devices to use the main renderer of the host system, as you can see it contains a hardcoded card0 reference. In addition, one needs to set WLR_DRM_DEVICES = "/dev/dri/card0:/dev/dri/card1 to 'encourage' sway to use both 'cards'.

I resolved the issue by updating the patch to use renderD128 by default, and by no longer setting WLR_DRM_DEVICES.

overlays = [
  (final: prev: {
    wlroots_0_17 = prev.wlroots_0_17.overrideAttrs (old: {
      patches = (old.patches or [ ]) ++ [
        (prev.fetchpatch {
          url = "https://gist.githubusercontent.com/hughobrien/7879f1faeab354dbc07a8af2f053e715/raw/9155431b6c30d1ab92898238e050c75764ecbe0e/DisplayLink.patch";
          hash = "sha256-VUAoYgEs4A2T3bVOf5dRHZq+a9FRimU12ndBMEESw/M=";
        })
      ];
    });
  })
];

Apologies for the noise Sandro.

SuperSandro2000 commented 6 months ago
vaapiVdpau

Whats the reason why you are still using that? It is pretty old and unmaintained.


Glad you find a solution. I don't have any multi GPU systems and things usually work smoother on just intel GPUs.

hughobrien commented 6 months ago

Probably just carbon copying from the wiki