ValveSoftware / Proton

Compatibility tool for Steam Play based on Wine and additional components
Other
24.34k stars 1.06k forks source link

Proton crashes on startup no matter what game I try to start #7610

Closed 0x000C0A71 closed 6 months ago

0x000C0A71 commented 7 months ago

Proton logs from trying to start different games:

System Information:

Vulkan Instance Version: 1.3.275

Instance Extensions: count = 23

VK_EXT_acquire_drm_display : extension revision 1 VK_EXT_acquire_xlib_display : extension revision 1 VK_EXT_debug_report : extension revision 10 VK_EXT_debug_utils : extension revision 2 VK_EXT_direct_mode_display : extension revision 1 VK_EXT_display_surface_counter : extension revision 1 VK_EXT_surface_maintenance1 : extension revision 1 VK_EXT_swapchain_colorspace : extension revision 4 VK_KHR_device_group_creation : extension revision 1 VK_KHR_display : extension revision 23 VK_KHR_external_fence_capabilities : extension revision 1 VK_KHR_external_memory_capabilities : extension revision 1 VK_KHR_external_semaphore_capabilities : extension revision 1 VK_KHR_get_display_properties2 : extension revision 1 VK_KHR_get_physical_device_properties2 : extension revision 2 VK_KHR_get_surface_capabilities2 : extension revision 1 VK_KHR_portability_enumeration : extension revision 1 VK_KHR_surface : extension revision 25 VK_KHR_surface_protected_capabilities : extension revision 1 VK_KHR_wayland_surface : extension revision 6 VK_KHR_xcb_surface : extension revision 6 VK_KHR_xlib_surface : extension revision 6 VK_LUNARG_direct_driver_loading : extension revision 1

Instance Layers: count = 5

VK_LAYER_MESA_device_select Linux device selection layer 1.3.211 version 1 VK_LAYER_VALVE_steam_fossilize_32 Steam Pipeline Caching Layer 1.3.207 version 1 VK_LAYER_VALVE_steam_fossilize_64 Steam Pipeline Caching Layer 1.3.207 version 1 VK_LAYER_VALVE_steam_overlay_32 Steam Overlay Layer 1.3.207 version 1 VK_LAYER_VALVE_steam_overlay_64 Steam Overlay Layer 1.3.207 version 1

Devices:

GPU0: apiVersion = 1.3.267 driverVersion = 23.3.6 vendorID = 0x1002 deviceID = 0x731f deviceType = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU deviceName = AMD Radeon RX 5700 XT (RADV NAVI10) driverID = DRIVER_ID_MESA_RADV driverName = radv driverInfo = Mesa 23.3.6 conformanceVersion = 1.2.7.1 deviceUUID = 00000000-0a00-0000-0000-000000000000 driverUUID = 414d442d-4d45-5341-2d44-525600000000


All of the logs have the same exception trigger:

trace:seh:dispatch_exception code=6ba flags=0 addr=00006FFFFFC1CE87 ip=6fffffc1ce87 warn:seh:dispatch_exception RPC_S_SERVER_UNAVAILABLE exception (code=6ba) raised



I wanted to try to figuring out which dll raises that exception by adding `PROTON_DUMP_DEBUG_COMMANDS=1` to the launch options, but no `/tmp/proton_(username)` directory was created so I did not manage to attatch a debugger to proton.
ivyl commented 7 months ago

PROTON_DUMP_DEBUG_COMMANDS=1 was a relic from pre-Steam-Runtime days. It's no longer mentioned in the README and was removed. See https://github.com/ValveSoftware/Proton/blob/proton_9.0/docs/DEBUGGING.md

The thing you've quoted about RPC_S_SERVER_UNAVAILABLE is harmless.

The main culprit and common theme between few logs I've looked at is that explorer.exe crashes in a wine syscall (not a real one, just the thing that crosses Windows - Linux boundary) right as you load winex11.drv.

What customization have you done to your environment? I see a lot of messages about LD_PRELOAD in the log.

Can you attach a log created with PROTON_LOG=+relay,+x11drv,+vulkan,+winediag?

kakra commented 7 months ago

I've seen on Gentoo a few times that no game would start any longer until I rebooted my system. A simple desktop restart or Steam client restart didn't help.

0x000C0A71 commented 7 months ago

I've seen on Gentoo a few times that no game would start any longer until I rebooted my system. A simple desktop restart or Steam client restart didn't help.

Sadly a reboot did not fix it

What customization have you done to your environment? I see a lot of messages about LD_PRELOAD in the log.

AFAIK none really. It is possible, that gentoo does some by default. I installed Steam via emerge (gentoo's package manager) and had to manually flag some libraries to install their 32 bit abi versions. Maybe their guide (https://wiki.gentoo.org/wiki/Steam) is outdated? Oh I also have multiple versions of wine installed that seem to all access the same C: drive. Maybe they're fighting?

Can you attach a log created with PROTON_LOG=+relay,+x11drv,+vulkan,+winediag?

Absolutely. It is a huge file of 35 MB with some 500'000 lines: https://gist.github.com/XerneraC/2c85732e404dcb74b6c4264519e83484

ivyl commented 7 months ago
708.776:00cc:00d0:trace:x11drv:X11DRV_UpdateDisplayDevices via "Fullscreen Hack"
708.777:00cc:00d0:trace:vulkan:X11DRV_vkCreateInstance create_info 0x1000f9570, allocator (nil), instance 0x1000f9568
708.802:00cc:00d0:trace:vulkan:X11DRV_vkGetInstanceProcAddr 0x5555563ea100, "vkEnumeratePhysicalDevices"
708.802:00cc:00d0:trace:vulkan:X11DRV_vkGetInstanceProcAddr 0x5555563ea100, "vkGetPhysicalDeviceProperties2KHR"
708.802:00cc:00d0:trace:vulkan:X11DRV_vkDestroyInstance 0x5555563ea100 (nil)
708.809:00cc:00d0:trace:seh:handle_syscall_fault code=c0000005 flags=0 addr=(nil) ip=0 tid=00d0

Looks like it may by crashing in Mesa when we are trying to destroy the instance here.

Can you rebuild Mesa and check that you have sensible USE flags for it?

If that doesn't help a log with PROTON_LOG=+xrandr,+x11drv,+vulkan,+winediag may tell us some more.

0x000C0A71 commented 7 months ago

Did not seem to help: https://gist.github.com/XerneraC/95c4bb42f95947144f3f66270dc0e1c4#file-steam-2357570-log-L2

This log is with the new PROTON_LOG options and with a recompiled mesa with the following use flags:

marc@MarcDesktop ~ $ equery u mesa            
[ Legend : U - final flag setting for installation]
[        : I - package is installed with flag     ]
[ Colors : set, unset                             ]
 * Found these USE flags for media-libs/mesa-23.3.6:
 U I
 + + X                    : Add support for X11
 + + abi_x86_32           : 32-bit (x86) libraries
 + + cpu_flags_x86_sse2   : Use the SSE2 instruction set
 - - d3d9                 : Enable Direct 3D9 API through Nine state tracker. Can be used together with patched wine.
 - - debug                : Enable extra debug codepaths, like asserts and extra output. If you want to get meaningful backtraces see https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Backtraces
 + + gles1                : Enable GLESv1 support.
 + + gles2                : Enable GLES 2.0 (OpenGL for Embedded Systems) support (independently of full OpenGL, see also: gles2-only)
 + + llvm                 : Enable LLVM backend for Gallium3D.
 - - lm-sensors           : Enable Gallium HUD lm-sensors support.
 + + opencl               : Enable the Rusticl Gallium OpenCL state tracker.
 - - osmesa               : Build the Mesa library for off-screen rendering.
 + + proprietary-codecs   : Enable codecs for patent-encumbered audio and video formats.
 - - test                 : Enable dependencies and/or preparations necessary to run tests (usually controlled by FEATURES=test but can be toggled independently)
 - - unwind               : Add support for call stack unwinding and function name resolution
 + + vaapi                : Enable Video Acceleration API for hardware decoding
 - - valgrind             : Enable annotations for accuracy. May slow down runtime slightly. Safe to use even if not currently using dev-debug/valgrind
 - - vdpau                : Enable the VDPAU acceleration interface for the Gallium3D Video Layer.
 - - video_cards_d3d12    : VIDEO_CARDS seeting to build driver for Microsoft WSL video cards
 - - video_cards_intel    : VIDEO_CARDS setting to build driver for Intel video cards
 - - video_cards_lavapipe : VIDEO_CARDS setting to build Vulkan software rasterizer using LLVMpipe
 - - video_cards_nouveau  : VIDEO_CARDS setting to build reverse-engineered driver for nvidia cards
 - - video_cards_r300     : VIDEO_CARDS setting to build only r300, r400 and r500 based chips code for radeon
 - - video_cards_r600     : VIDEO_CARDS setting to build only r600, r700, Evergreen and Northern Islands based chips code for radeon
 - - video_cards_radeon   : VIDEO_CARDS setting to build driver for ATI radeon video cards
 + + video_cards_radeonsi : VIDEO_CARDS setting to build only Southern Islands based chips code for radeon
 - - video_cards_virgl    : VIDEO_CARDS setting to build driver for virgil (virtual 3D GPU)
 - - video_cards_vmware   : VIDEO_CARDS setting to build driver for vmware video cards
 + + vulkan               : Add support for 3D graphics and computing via the Vulkan cross-platform API
 - - vulkan-overlay       : Build vulkan-overlay-layer which displays Frames Per Second and other statistics
 + + wayland              : Enable support for dev-libs/wayland
 - - xa                   : Enable the XA (X Acceleration) API for Gallium3D.
 + + zstd                 : Enable support for ZSTD compression

(+ + at the beginning of the line means active, and - - means inactive)

I've newly added the gles1 opencl & vaapi

kakra commented 7 months ago

AFAIK none really. It is possible, that gentoo does some by default. I installed Steam via emerge (gentoo's package manager) and had to manually flag some libraries to install their 32 bit abi versions

You should switch to a multi-lib profile instead. This should be seamless if you properly switch to a compatible profile, it will rebuild a lot of libraries. Gentoo's Steam launcher will add LD_PRELOAD depending on whether you enable to use the Steam runtime. I'd recommend to actually use it, enable that useflag.

You're probably running a no-multilib profile. Run eselect profile list to find a profile with the same path but no-multilib removed. Also, avoid running hardened profiles for gaming. If you do, DO NOT simply switch to a non-hardened profile yet: There's a detailed guide in the Gentoo wiki how to do that, otherwise your system breaks. Also, don't switch both hardened and multilib at the same time.

Doing this switch to a multilib profile also enables a very important stack alignment setting in some core packages (like glibc or mesa) so 32-bit apps won't crash. This is essential for running Proton games: The Steam runtime needs this stack alignment. After switching profiles, be careful that glibc and gcc don't rebuild in the wrong order: gcc has to be re-installed first. The correct way is to upgrade world first, then switch profile, then rebuild binutils, then gcc (if this includes glibc, emerge gcc with --nodeps), then glibc, then run env-update and source the updated environment, then emerge libtool, then upgrade/rebuild world again. Find more detailed instructions in the Gentoo wiki / forums / IRC.

If problems still persist, run emerge -ea @world to rebuild all packages.

Also run emerge -ca to remove packages no longer needed so they don't cause any conflicts if Steam tries to use them. Such packages most likely no longer have their proper binary dependencies installed. If you want to keep packages that -ca wants to remove, run emerge -n PACKAGENAME and run a world upgrade, so deps will be fixed.

If emerge shows orphaned libs after upgrades, run emerge -1a @preserved-rebuild. It will rebuild packages with broken reverse dependencies.

@ivyl may have more insight to tell you if you should enable use osmesa or add radeon to VIDEO_CARDS (I think some games may run better with the other driver, some games may need offscreen rendering).

Also, install cpuid2cpuflags, run it, and check if CPU_FLAGS_X86 is set in your make.conf. If it is set, ensure the contents are identical - it must not mismatch. If it is not set, you MAY add it but you don't have to (if you do, it optimizes some packages for your CPU).

0x000C0A71 commented 7 months ago

You should switch to a multi-lib profile instead. This should be seamless if you properly switch to a compatible profile, it will rebuild a lot of libraries.

I did not select a no-multilib profile. I'm on the clang profile:

marc@MarcDesktop ~ $ sudo eselect profile list
Available profile symlink targets:
  [1]   default/linux/amd64/17.1 (exp)
  [2]   default/linux/amd64/17.1/selinux (exp)
  [3]   default/linux/amd64/17.1/hardened (exp)
  [4]   default/linux/amd64/17.1/hardened/selinux (exp)
  [5]   default/linux/amd64/17.1/desktop (exp)
  [6]   default/linux/amd64/17.1/desktop/gnome (exp)
  [7]   default/linux/amd64/17.1/desktop/gnome/systemd/merged-usr (exp)
  [8]   default/linux/amd64/17.1/desktop/plasma (exp)
  [9]   default/linux/amd64/17.1/desktop/plasma/systemd/merged-usr (exp)
  [10]  default/linux/amd64/17.1/desktop/systemd/merged-usr (exp)
  [11]  default/linux/amd64/17.1/developer (exp)
  [12]  default/linux/amd64/17.1/no-multilib (exp)
  [13]  default/linux/amd64/17.1/no-multilib/hardened (exp)
  [14]  default/linux/amd64/17.1/no-multilib/hardened/selinux (exp)
  [15]  default/linux/amd64/17.1/no-multilib/systemd/merged-usr (exp)
  [16]  default/linux/amd64/17.1/no-multilib/systemd/selinux/merged-usr (exp)
  [17]  default/linux/amd64/17.1/systemd/merged-usr (exp)
  [18]  default/linux/amd64/17.1/systemd/selinux/merged-usr (exp)
  [19]  default/linux/amd64/17.1/clang (exp) *
  [20]  default/linux/amd64/17.1/systemd/clang/merged-usr (exp)
  [21]  default/linux/amd64/23.0 (stable)
  [22]  default/linux/amd64/23.0/systemd (stable)
  [23]  default/linux/amd64/23.0/desktop (stable)
  [24]  default/linux/amd64/23.0/desktop/systemd (stable)
  ...

Gentoo's Steam launcher will add LD_PRELOAD depending on whether you enable to use the Steam runtime. I'd recommend to actually use it, enable that useflag.

Steam runtime is already enabled:

marc@MarcDesktop ~ $ equery u steam-launcher  
[ Legend : U - final flag setting for installation]
[        : I - package is installed with flag     ]
[ Colors : set, unset                             ]
 * Found these USE flags for games-util/steam-launcher-1.0.0.79:
 U I
 + + desktop-portal     : Enable desktop integration, e.g. for file pickers
 + + dialogs            : Support additional dialogs before the client starts
 + + joystick           : Add support for joysticks in all packages
 + + pulseaudio         : Add sound server support via media-libs/libpulse (may be PulseAudio or PipeWire)
 + + steamruntime       : Use the official Steam runtime libraries
 - - steamvr            : Enable SteamVR virtual reality support
 - - trayicon           : Enable system tray icon
 + + udev               : Enable virtual/udev integration (device discovery, power and storage device support, etc)
 - - video_cards_nvidia : VIDEO_CARDS setting to build driver for nvidia video cards
 + + wayland            : Enable dev-libs/wayland backend

Doing this switch to a multilib profile also enables a very important stack alignment setting in some core packages (like glibc or mesa) so 32-bit apps won't crash. This is essential for running Proton games: The Steam runtime needs this stack alignment.

Maybe this is not enabled on the clang profile? What are those stack alignment options such that I can enable them manually?

My entire emerge --info: https://gist.github.com/XerneraC/82eb832be76d8cde2712d3564bfef32f (I manually added some useflags from the desktop profile in my make.conf to make the clang profile desktop-able)

Also, install cpuid2cpuflags, run it, and check if CPU_FLAGS_X86 is set in your make.conf. If it is set, ensure the contents are identical - it must not mismatch. If it is not set, you MAY add it but you don't have to (if you do, it optimizes some packages for your CPU).

I set up my system last week and correctly did this step. I have a file /etc/portage/package.use/00cpu-flags that reads:

marc@MarcDesktop ~ $ cat /etc/portage/package.use/00cpu-flags
*/* CPU_FLAGS_X86: aes avx avx2 f16c fma3 mmx mmxext pclmul popcnt rdrand sha sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3
kakra commented 7 months ago

So maybe it's only stack re-alignment that's missing?

I did this back the day when it was an issue with some gcc versions?

# /etc/portage/package.env
media-libs/openal stackrealign
media-sound/pulseaudio stackrealign
sys-libs/glibc stackrealign
x11-libs/libxcb stackrealign
# maybe more packages need it, there's a useflag "stack-realign" which is masked in some profiles

# /etc/portage/env/stackrealign
CFLAGS="${CFLAGS} -mstackrealign"
CXXFLAGS="${CXXFLAGS} -mstackrealign"

This realignment somewhat slows performance down but is needed for 32-bit binaries created by older compilers (I think gcc-8 or older), as far as I understood.

Maybe it helps.

0x000C0A71 commented 7 months ago

I'll try that once my once my current emerge finishes.

After that previous comment I tried to find find what differences the clang profile is doing, and found out I forgot quite a few useflags and package specific useflags when I made my system desktop-able. I fixed this up and am now recompiling my system (quite a few big packages need recompiling and that will take a while).

I'll be sure to restart and try out games after that recompile to properly document the source of my proton not working.

0x000C0A71 commented 7 months ago

My desktop use flags (mentioned above) did sadly not fix it.

# /etc/portage/package.env
media-libs/openal stackrealign
media-sound/pulseaudio stackrealign
sys-libs/glibc stackrealign
x11-libs/libxcb stackrealign
# maybe more packages need it, there's a useflag "stack-realign" which is masked in some profiles

I added them (and made sure in the build log that the flag was actually passed to the compiler), but it did not improve. (https://gist.github.com/XerneraC/23ed6ed74d1993c459fe4a19a9154751) You mentioned, that there may be more packages, is there a way for me to find out which ones? Does this flag only affect 32-Bit, and by how much does it impact performance? If it's only on 32-Bit and/or the impact is low enough, I might enable it globally.

Another work around would be to install steam in a chroot where I could a more standard gcc profile. One of the steps mentioned there is to link the X-Server into the chroot. I have not, however, found a way to link a wayland socket or anything into the chroot. Therefore I'm hesitant to do a chroot install, as I would effectively be forcing all games to run through Xwayland...

kakra commented 7 months ago

Does this flag only affect 32-Bit, and by how much does it impact performance? If it's only on 32-Bit and/or the impact is low enough, I might enable it globally.

I think the impact is low. I'm not sure if anyone measured it... Try euse -i stack-realign to find packages that officially use it, it should cover all important packages which need it:

# euse -i stack-realign
global use flags (searching: stack-realign)
************************************************************
no matching entries found

local use flags (searching: stack-realign)
************************************************************
[-      ] stack-realign
    sys-libs/glibc: Realign the stack in the 32-bit build for
    compatibility with older binaries at some performance cost
              (2.2) 2.19-r3 [gentoo]
              (2.2) 2.31-r7 [gentoo]
              (2.2) 2.32-r8 [gentoo]
              (2.2) 2.33-r14 [gentoo]
              (2.2) 2.34-r14 [gentoo]
        [+P ] (2.2) 2.35-r11 [gentoo]
        [+P ] (2.2) 2.36-r8 [gentoo]
        [+P ] (2.2) 2.37-r10 [gentoo]
        [+P ] (2.2) 2.38-r10 [gentoo]
        [+P ] (2.2) 2.38-r11 [gentoo]
        [+P ] (2.2) 2.39-r1 [gentoo]
        [+P ] (2.2) 2.39-r2 [gentoo]
        [+P ] (2.2) 9999 [gentoo]

[-      ] stack-realign
    sys-libs/ncurses: Realign the stack in the 32-bit build for compatibility with older binaries at some performance cost. Avoids crashes in older 32-bit binaries. Only affects x86/32-bit multilib builds on amd64.
        [+ B] (0/6) 6.4_p20230401 [gentoo]
        [+ B] (0/6) 6.4_p20230527 [gentoo]

[-      ] stack-realign
    sys-libs/ncurses-compat: Realign the stack in the 32-bit build for compatibility with older binaries at some performance cost. Avoids crashes in older 32-bit binaries. Only affects x86/32-bit multilib builds on amd64.
        [+ B] (5/5) 6.4_p20230401 [gentoo]

According to that it looks like the packages I originally used are no longer incompatible. Actually, I no longer use those environment changes, and I'm only using the official useflag.

Another work around would be to install steam in a chroot where I could a more standard gcc profile. One of the steps mentioned there is to link the X-Server into the chroot. I have not, however, found a way to link a wayland socket or anything into the chroot. Therefore I'm hesitant to do a chroot install, as I would effectively be forcing all games to run through Xwayland...

You could use the flatpak version of Steam. But it may have some problems properly interacting with your normal system resources, e.g. using OBS, dbus or audio filters could be difficult. Also, from my tests, flatpak may cause some inefficiencies due to passing data through the container boundaries (at least for me, OBS uses way less CPU native than via flatpak, tho, it's still low).

0x000C0A71 commented 7 months ago

You could use the flatpak version of Steam. But it may have some problems properly interacting with your normal system resources, e.g. using OBS, dbus or audio filters could be difficult. Also, from my tests, flatpak may cause some inefficiencies due to passing data through the container boundaries (at least for me, OBS uses way less CPU native than via flatpak, tho, it's still low).

Sadly, both flatpak and chroot don't work. There I don't even get steam running:

As for proton

Is there some good way I can debug the error?

Looks like it may by crashing in Mesa when we are trying to destroy the instance here.

@ivyl Seems to have pinpointed where the exception occurs.

Maybe compile a patched version of proton that logs more?

Attach a debugger, break on vkDestroyInstance and step through?

On line 1036 of the log it traces entry into the X11DRV_vkDestroyInstance function, that just passes its inputs on to the pvkDestroyInstance global function pointer. I cannot find an assignment to that. Where does it get assigned? If that pointer were to never be assigned, that would explain the nullptr exception.

matthiasbe commented 6 months ago

Hi, I encounter this exact error with Proton experimental, with arch linux and RADV driver. I have Radeon pro WX 3200 and I tried with different vulkan drivers without success.

The game is Sons of the Forest.

I had this error in the beginning and it was solve by explicitely specifying the 32 bits driver with VK_DRIVER_FILES, as suggested in vulkan arch documentation. There was error further so I gave up woth this solution.

Now I think this is still a problem with 32 bit applications to select the right binary somehow, but which one according to the debug info you provided ?

I will try to post my config asap.

Bye, matthias

0x000C0A71 commented 6 months ago

I had this error in the beginning and it was solve by explicitely specifying the 32 bits driver with VK_DRIVER_FILES, as suggested in vulkan arch documentation. There was error further so I gave up woth this solution.

Ran steam as

VK_DRIVER_FILES=/usr/share/vulkan/icd.d/radeon_icd.i686.json:/usr/share/vulkan/icd.d/radeon_icd.x86_64.json steam

but sadly, that didn't improve my situation.


Attach a debugger, break on vkDestroyInstance and step through?

I cloned the repo, compiled a debug build and followed proton's debugging guide, but I could not at all get that to work. I even compiled a custom version that would raise a SIGSTOP in X11DRV_vkDestroyInstance with the intent to then continue via (gdb) continue. That might work, but my log file got spammed with a lot of other errors (also exploded in size), and (gdb) continue only yielded errors in some explorer.exe (don't remember the exact name, but it felt completely unrelated).

On line 1036 of the log it traces entry into the X11DRV_vkDestroyInstance function, that just passes its inputs on to the pvkDestroyInstance global function pointer. I cannot find an assignment to that. Where does it get assigned? If that pointer were to never be assigned, that would explain the nullptr exception.

Ok, I found the assignment. I feel somewhat stupid, as now it seems pretty obvious. pvkDestroyInstance is not null, and is instead a function in the vulkan loader. I tried to compile a patched version of the vulkan loader where I put a print statement on every line of its vkDestroyInstance function to see how far it gets, but stdout does not seem to be routed to the log file.

A cheeky

((void (*)())0xF00000000000)();

inside vkDestroyInstance also resulted in the log file exploding and was of no help (the magic number was nowhere in the log file, somehow) (intent was to trigger an access violation and to see the magic number in the IP register in the core dump) (This also obviously caused crashes in many of my other programs...)

matthiasbe commented 6 months ago

On my side I have tried the following:

  1. vulkaninfo --summary
  2. vkcube
  3. vkgears32
  4. winecfg either on the game WINEPREFIX or on the default one

All prefixed with either

  1. VK_DRIVER_FILES=/usr/share/vulkan/icd.d/radeon_icd.i686.json:/usr/share/vulkan/icd.d/radeon_icd.x86_64.json
  2. VK_DRIVER_FILES=/usr/share/vulkan/icd.d/radeon_icd.i686.json

I would suggest you try this, and also other ICD loaders that you have in folder /usr/share/vulkan/icd.d/radeon_icd.i686.json for me it works with llvmpipe (lvp_*) but not with the amd proprietary drivers. I ended up uninstalling everything but radeon-vulkan. You can also try to launch OpenGL demos such as glxgears, but I'm note sure it's relevant here, because it seems everything relates to vulkan (one you use vulkan, is it possible that small thing like creating windows are still passed to OpenGl ?)

In the end with the second prefix all commands work whereas with the first prefix 32 bit application (winecfg and vkgears32) fail saying they can't find the vulkan driver. I tried with steam and then with the second prefix I go further, and especially the use of DXVK, but DXVK fails as it does not find the vulkan driver.

I guess, but I'm not sure at all, that with the first prefix, something is wrong in wine (tried with proton 7,8 and 9) and it doesn't find the driver. With the second prefix, vkgears32 does work so maybe it does find the driver to create the window, but now when launching the game it is DXVK who can't find it.

Impressive debugging btw, I guess it would be interesting to find (next step for me) which librairy are link with ldd and if they contain the symbol X11DRV_vkDestroyInstance with nm.

matthiasbe commented 6 months ago

steam-1326470.log Here the log with options VK_DRIVER_FILES=/usr/share/vulkan/icd.d/radeon_icd.i686.json PROTON_LOG=1 %command%

0x000C0A71 commented 6 months ago

VK_DRIVER_FILES=/usr/share/vulkan/icd.d/radeon_icd.i686.json

Okay, this changed something:

However, looking at the crash log (shortened. Full crash log was 33 MB) on line 1033:

10713.943:00cc:00d0:warn:xrandr:add_remaining_gpus_via_vulkan Failed to create a Vulkan instance, vr -9.

This seems to be a regression instead of a progression...

Furthermore, looking at lines 1035 & 1037:

10713.943:00cc:00d0:trace:x11drv:X11DRV_UpdateDisplayDevices GPU count: 1
10713.951:00cc:00d0:trace:x11drv:X11DRV_UpdateDisplayDevices GPU: 0 L"Wine Adapter", adapter count: 3

It seems to not be able to get ahold of my GPU at all now. In my previous log, it seems to have found my GPU:

779.226:00d4:00d8:trace:xrandr:add_remaining_gpus_via_vulkan Added a new GPU via Vulkan: 1002:731f L"AMD Radeon RX 5700 XT (RADV NAVI10)"

Adding

VK_DRIVER_FILES=/usr/share/vulkan/icd.d/radeon_icd.i686.json:/usr/share/vulkan/icd.d/radeon_icd.x86_64.json

to my launch options results in what I previously had, so that's no help.

for me it works with llvmpipe

As llvmpipe is software rendering, I don't think it is a good idea to play graphically demanding things on it.

0x000C0A71 commented 6 months ago

However, looking at the crash log (shortened. Full crash log was 33 MB) on line 1033:

10713.943:00cc:00d0:warn:xrandr:add_remaining_gpus_via_vulkan Failed to create a Vulkan instance, vr -9.

Maybe this indicates, that the 32-Bit version of my graphics driver is not working, and that is the issue?

kakra commented 6 months ago

I came across a similar problem after having enabled multi-monitor GPU support in the BIOS (which actually enabled the iGPU of my processor). Most games started but at least "Elite Dangerous" (#150) no longer started unless I've used DXVK_FILTER_DEVICE_NAME="NVIDIA".

But this is probably rather an issue with the launcher and not with Proton. But since this issue is easily found when searching for text from the Proton log, and you enabled multiple GPUs, applying the DXVK filter may fix the problem.

0x000C0A71 commented 6 months ago

you enabled multiple GPUs

I do not have multiple GPUs:

marc ➜  ~ λ vulkaninfo --summary                                                                                                                       24-04-11 5:08:58
==========
VULKANINFO
==========

Vulkan Instance Version: 1.3.275

Instance Extensions: count = 23
-------------------------------
VK_EXT_acquire_drm_display             : extension revision 1
VK_EXT_acquire_xlib_display            : extension revision 1
VK_EXT_debug_report                    : extension revision 10
VK_EXT_debug_utils                     : extension revision 2
VK_EXT_direct_mode_display             : extension revision 1
VK_EXT_display_surface_counter         : extension revision 1
VK_EXT_surface_maintenance1            : extension revision 1
VK_EXT_swapchain_colorspace            : extension revision 4
VK_KHR_device_group_creation           : extension revision 1
VK_KHR_display                         : extension revision 23
VK_KHR_external_fence_capabilities     : extension revision 1
VK_KHR_external_memory_capabilities    : extension revision 1
VK_KHR_external_semaphore_capabilities : extension revision 1
VK_KHR_get_display_properties2         : extension revision 1
VK_KHR_get_physical_device_properties2 : extension revision 2
VK_KHR_get_surface_capabilities2       : extension revision 1
VK_KHR_portability_enumeration         : extension revision 1
VK_KHR_surface                         : extension revision 25
VK_KHR_surface_protected_capabilities  : extension revision 1
VK_KHR_wayland_surface                 : extension revision 6
VK_KHR_xcb_surface                     : extension revision 6
VK_KHR_xlib_surface                    : extension revision 6
VK_LUNARG_direct_driver_loading        : extension revision 1

Instance Layers: count = 6
--------------------------
VK_LAYER_KHRONOS_validation       Khronos Validation Layer     1.3.275  version 1
VK_LAYER_MESA_device_select       Linux device selection layer 1.3.211  version 1
VK_LAYER_VALVE_steam_fossilize_32 Steam Pipeline Caching Layer 1.3.207  version 1
VK_LAYER_VALVE_steam_fossilize_64 Steam Pipeline Caching Layer 1.3.207  version 1
VK_LAYER_VALVE_steam_overlay_32   Steam Overlay Layer          1.3.207  version 1
VK_LAYER_VALVE_steam_overlay_64   Steam Overlay Layer          1.3.207  version 1

Devices:
========
GPU0:
    apiVersion         = 1.3.274
    driverVersion      = 24.0.4
    vendorID           = 0x1002
    deviceID           = 0x731f
    deviceType         = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
    deviceName         = AMD Radeon RX 5700 XT (RADV NAVI10)
    driverID           = DRIVER_ID_MESA_RADV
    driverName         = radv
    driverInfo         = Mesa 24.0.4
    conformanceVersion = 1.2.7.1
    deviceUUID         = 00000000-0a00-0000-0000-000000000000
    driverUUID         = 414d442d-4d45-5341-2d44-525600000000

which actually enabled the iGPU of my processor

I have a Ryzen 1700, so no iGPU present

marc ➜  ~ λ ls /usr/share/vulkan/icd.d                                                                                                                 24-04-11 5:08:59
radeon_icd.i686.json  radeon_icd.x86_64.json

I assume I have 2 drivers, as I've built mesa with abi_x86_32, and thus have a 32 bit and a 64 bit driver

0x000C0A71 commented 6 months ago

I managed to trace the nullptr exception, and it's really odd.

I patched both proton-wine and vulkan-loader to allow me to log a bit more:

The nullptr exception traces as follows:

winex11.drv : X11DRV_vkDestroyInstance
> vulkan-loader : vkDestroyInstance
> > vulkan-loader : loader_delete_layer_list_and_properties
> > > vulkan-loader : loader_platform_close_library (with libVkLayer_MESA_device_select.so)
> > > > dlfcn.h : dlclose

The nullptr exception happens inside dlclose, as I have a log statement right before it and right after it. On line 1103 of the log file, you can also see that the lib handle for libVkLayer_MESA_device_select.so seems to not be broken.

0x000C0A71 commented 6 months ago

I managed to find a fix/workaround!

The nullptr exception happens inside dlclose when it's trying to unload libVkLayer_MESA_device_select.so, as described in the previous post. I googled around a bit, and found the NODEVICE_SELECT=1 environment variable that just disables that layer.

I put this in my launch options, and the game launches.


I still don't see how libVkLayer_MESA_device_select.so manages to induce a segfault in dlclose (???), but disabling that layer seems to fix things. IDK if this is a bug on wine's side or mesa's side, but as it is running now, I'm closing the issue.

0x000C0A71 commented 6 months ago

ah BTW @matthiasbe this might be a fix for you too. Give it a shot.

matthiasbe commented 6 months ago

Well done with the debugging. For me this option does not fix it unfortunately, I'm still looking for the problem. I also tried DXVK_FILTER_DEVICE_NAME without success, but anyway thank you @kraka for giving your insight. I don't have much time (and skills) to debug this but if this change, I will open an other issue.