ValveSoftware / Proton

Compatibility tool for Steam Play based on Wine and additional components
Other
23.96k stars 1.05k forks source link

Games crash on launch w/ NVIDIA and AMD drivers/hardware loaded/enabled at the same time. #6222

Open jhnphm opened 1 year ago

jhnphm commented 1 year ago

I'm using VFIO for the occasional incompatible windows game. All games seem to not complete startup w/ Proton 5.13+ (tried 7.x, experimental, etc) whenever the NVIDIA card is bound to the host. My main display is being run off of the AMD iGPU and I'm launching w/ prime-run steam. This problem manifests with or without prime-run though. If I unbind the NVIDIA card proton runs fine. Versions of proton < 5.13 also run fine.

Issue seems similar to https://github.com/ValveSoftware/Proton/issues/6180

I'm using Arch Linux, Ryzen 5700G, nVidia 3070

Logs attached:

slr-app837470-t20221007T121808.log steam-837470.log sysinfo.log

Console log:

/bin/sh\0-c\0PROTON_LOG=1 /home/john/.local/share/Steam/ubuntu12_32/reaper SteamLaunch AppId=837470 -- /home/john/.local/share/Steam/ubuntu12_32/steam-launch-wrapper -- '/home/john/.local/share/Steam/steamapps/common/SteamLinuxRuntime_soldier'/_v2-entry-point --verb=waitforexitandrun -- '/home/john/.local/share/Steam/steamapps/common/Proton 5.13'/proton waitforexitandrun  '/home/john/.local/share/Steam/steamapps/common/Untitled Goose Game/Untitled.exe'\0
Game process added : AppID 837470 "PROTON_LOG=1 /home/john/.local/share/Steam/ubuntu12_32/reaper SteamLaunch AppId=837470 -- /home/john/.local/share/Steam/ubuntu12_32/steam-launch-wrapper -- '/home/john/.local/share/Steam/steamapps/common/SteamLinuxRuntime_soldier'/_v2-entry-point --verb=waitforexitandrun -- '/home/john/.local/share/Steam/steamapps/common/Proton 5.13'/proton waitforexitandrun  '/home/john/.local/share/Steam/steamapps/common/Untitled Goose Game/Untitled.exe'", ProcID 14418, IP 0.0.0.0:0
chdir /home/john/.local/share/Steam/steamapps/common/Untitled Goose Game
ERROR: ld.so: object '/home/john/.local/share/Steam/ubuntu12_32/gameoverlayrenderer.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS32): ignored.
ERROR: ld.so: object '/home/john/.local/share/Steam/ubuntu12_64/gameoverlayrenderer.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS64): ignored.
GameAction [AppID 837470, ActionID 1] : LaunchApp changed task to WaitingGameWindow with ""
ERROR: ld.so: object '/home/john/.local/share/Steam/ubuntu12_32/gameoverlayrenderer.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS32): ignored.
ERROR: ld.so: object '/home/john/.local/share/Steam/ubuntu12_32/gameoverlayrenderer.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS32): ignored.
ERROR: ld.so: object '/home/john/.local/share/Steam/ubuntu12_32/gameoverlayrenderer.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS32): ignored.
ERROR: ld.so: object '/home/john/.local/share/Steam/ubuntu12_32/gameoverlayrenderer.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS32): ignored.
GameAction [AppID 837470, ActionID 1] : LaunchApp changed task to Completed with ""
ThreadGetProcessExitCode: no such process 14529
ThreadGetProcessExitCode: no such process 14527
ThreadGetProcessExitCode: no such process 14420
Game process updated : AppID 837470 "PROTON_LOG=1 /home/john/.local/share/Steam/ubuntu12_32/reaper SteamLaunch AppId=837470 -- /home/john/.local/share/Steam/ubuntu12_32/steam-launch-wrapper -- '/home/john/.local/share/Steam/steamapps/common/SteamLinuxRuntime_soldier'/_v2-entry-point --verb=waitforexitandrun -- '/home/john/.local/share/Steam/steamapps/common/Proton 5.13'/proton waitforexitandrun  '/home/john/.local/share/Steam/steamapps/common/Untitled Goose Game/Untitled.exe'", ProcID 14528, IP 0.0.0.0:0
Installing breakpad exception handler for appid(steam)/version(1665100899)
Installing breakpad exception handler for appid(steam)/version(1665100899)
Steam: An X Error occurred
X Error of failed request:  BadMatch (invalid parameter attributes)
Major opcode of failed request:  148
Serial number of failed request:  338
xerror_handler: X failed, continuing
kisak-valve commented 1 year ago

Hello @jhnphm, please copy your system information from Steam (Steam -> Help -> System Information) and put it in a gist, then include a link to the gist in this issue report.

jhnphm commented 1 year ago

Hello @jhnphm, please copy your system information from Steam (Steam -> Help -> System Information) and put it in a gist, then include a link to the gist in this issue report.

I've copied it into the updated post above in sysinfo.log but also here: https://gist.github.com/jhnphm/f9e45d04d374cb9613386ac094b5e50a

kisak-valve commented 1 year ago

Thanks, AMDVLK has a history of breaking other Vulkan driver implementations. If you remove / disable AMDVLK and use mesa/RADV instead, are you able to reproduce this scenario?

jhnphm commented 1 year ago

Yes.

(below log is running on Proton 7.x):

slr-app837470-t20221007T124252.log steam-837470.log

kisak-valve commented 1 year ago

12:42:52.860029: pressure-vessel-wrap[27962]: I: Vulkan ICD #0 at /usr/share/vulkan/icd.d/amd_icd32.json: /usr/lib32/amdvlk32.so AMDVLK is still in the mix in your test.

jhnphm commented 1 year ago

Ah, left the 32-bit amdvlk in the mix. New test:

slr-app837470-t20221007T130801.log steam-837470.log

jhnphm commented 1 year ago

For reference, this is a working run w/ the NVIDIA GPU unbound, run w/o prime-run: slr-app837470-t20221007T134748.log steam-837470.log

For apples to apples, nonworking run, NVIDIA GPU bound, w/o prime-run: slr-app837470-t20221007T135131.log steam-837470.log

A working NVIDIA GPU bound, w/o prime-run, on Proton 5.0:

steam-837470.log (couldn't find the steam runtime logfiles for some reason)

A working NVIDIA GPU bound, w/ prime-run, on Proton 5.0:

steam-837470.log

slr-app1420170-t20221007T135842.log

Basically combination of 5.13+ AND the NVIDIA GPU bound to the host but not necessarily active (doesn't make a difference whether prime-run is used or not) breaks.

jhnphm commented 1 year ago

Actually, I'm not even able to launch winecfg in the prefix w/ the NVIDIA GPU bound:

john@thor [02:27:47 PM] [~] 
-> % export GAMEID=837470
john@thor [02:28:11 PM] [~] 
-> % WINEPREFIX=~/.steam/steam/steamapps/compatdata/$GAMEID/pfx/ WINEARCH=win64 .steam/steam/steamapps/common/Proton\ 7.0/dist/bin/wine64 'winecfg.exe'
wineserver: using server-side synchronization.
wine: RLIMIT_NICE is <= 20, unable to use setpriority safely
wine: Unhandled page fault on execute access to 00007F2D614EF3D0 at address 00007F2D614EF3D0 (thread 00cc), starting debugger...
00c4:err:winediag:nodrv_CreateWindow Application tried to create a window, but no driver could be loaded.
00c4:err:winediag:nodrv_CreateWindow The explorer process failed to start.
john@thor [02:28:15 PM] [~] 
jhnphm commented 1 year ago

Installing vulkan-mesa-layers/lib32-vulkan-mesa-layers (https://bbs.archlinux.org/viewtopic.php?id=279672) helps running winecfg and untitled goose game directly w/ proton, but it still breaks if prime-run is enabled or if it's run through steam w/ the common error signature:

00c4:err:winediag:nodrv_CreateWindow Application tried to create a window, but no driver could be loaded.
00c4:err:winediag:nodrv_CreateWindow The explorer process failed to start.

Potentially related: https://www.reddit.com/r/linux_gaming/comments/rvzu5p/cant_run_winelutrisproton_apps_on_a_gpu_thats_not/ . It looks like I can get this to work, at least to start winecfg from the command line, if I bind the GPU before starting X, but that means I can no longer unbind it for passing it through to a VM w/o restarting X. Running untitled goose game from steam still doesn't work though.

Most other native applications like vkcube and Proton <= 5.0 work fine on the nVidia dGPU w/o Xorg started after binding to the GPU, so it does seem like a Proton/Wine regression.

jhnphm commented 1 year ago

I can get prime-run to work w/ the scripts generated by using PROTON_DUMP_DEBUG_COMMANDS, if I switch to wayland, but I still can't get it to run via the steam GUI. Looks like bypassing the steam runtime with the arch steam-native script works too.

smcv commented 1 year ago

This might be a Proton regression, but you said that Proton <= 5.0 is good and 5.13+ is bad, which suggests that one important factor might be whether you're using the SteamLinuxRuntime_soldier container runtime (which is used by Proton 5.13+, and optionally for native Linux games) or not (Proton <= 5.0 and most native Linux games).

However, there were also a lot of non-container-runtime-related changes between Proton 5.0 and 5.13, so it's also possible that this is genuinely a Proton problem and nothing to do with the container runtime.

Multi-GPU is complicated, Proton is complicated, and SteamLinuxRuntime is complicated, so the combination of the three gets very confusing. Please try to narrow down where the problem is, with as few complicated things involved as possible:

smcv commented 1 year ago

A working NVIDIA GPU bound, w/o prime-run, on Proton 5.0: (couldn't find the steam runtime logfiles for some reason)

The SteamLinuxRuntime_soldier container runtime is not used for Proton 5.0, so it is correct and expected that you will not get a SteamLinuxRuntime_soldier/var/slr-*.log for Proton 5.0 games.

A working NVIDIA GPU bound, w/ prime-run, on Proton 5.0: steam-837470.log slr-app1420170-t20221007T135842.log

These logs don't match: if it was using Proton 5.0, then you wouldn't get a slr-*.log for that run. slr-app1420170-t20221007T135842.log seems to be an unrelated log from running Proton\ 5.13/proton run /home/john/.local/share/Steam/ubuntu12_32/../bin/d3ddriverquery64.exe (see the first line).

smcv commented 1 year ago

-> % WINEPREFIX=~/.steam/steam/steamapps/compatdata/$GAMEID/pfx/ WINEARCH=win64 .steam/steam/steamapps/common/Proton\ 7.0/dist/bin/wine64 'winecfg.exe'

This is unsupported: Proton 5.13+ is intended to always be run in the SteamLinuxRuntime_soldier container environment, not on the host system. However, if this is also failing with the same symptoms as in the container runtime, then that suggests that the problem might be with Proton and not the container runtime.

Looks like bypassing the steam runtime with the arch steam-native script works too

This is also unsupported: the steam-for-linux binaries are intended to always be run with the (older, LD_LIBRARY_PATH-based) Steam Runtime, which is what steam-native disables. Scripts in the Steam Runtime are responsible for choosing whether to take each library from your host system or from the runtime (in most cases whichever one is newer must be used).

I'm surprised that steam-native has any effect on the container runtime - it only disables the older, LD_LIBRARY_PATH-based runtime mechanism (used by Steam itself, Proton <= 5.0 and most native Linux games) and shouldn't do anything to the container runtime. If steam-native vs. steam-runtime makes a difference, then there must be some relatively subtle interaction going on.

Are you sure you are running steam-native in exactly the same way that you were running Steam with the normal Steam Runtime enabled, so that the only difference is -native or not?

One thing that might be significant here is that if you run Steam from a desktop environment shortcut, most desktop environments will try to launch it on a discrete or non-default GPU using PRIME or similar (via PrefersNonDefaultGPU=true and X-KDE-RunOnDiscreteGpu=true), but if you run it from a command-line prompt, that will not take effect. So I wonder whether the difference might really be that you are running steam-native from a terminal (therefore on your default GPU), but running Steam in its normal supported mode from a desktop shortcut (therefore on your discrete GPU)?

jhnphm commented 1 year ago

More recent sysinfo w/ amdvlk disabled: https://gist.github.com/jhnphm/535dc9ee4154fee34648c712fc357eab

CS:GO works natively both w/ OpenGL and w/ Vulkan, and w/ the runtime set to Steam Linux Runtime. so it seems to really be a Proton issue as opposed to a runtime issue.

The steam-native thing seems to be a red-herring, probably messed up some testing w/ GPU in a bad state or some other weird transient problem. I can get Steam running Proton games w/ the latest Proton normally w/ dGPU bound under Wayland though.

It might have something to do w/ binding the GPU after Xorg is started to keep Xorg from binding to it and making it un-unbindable for VMs w/o restarting the DE. [EDIT Nope, makes no difference].

Multi-GPU used to work on Xorg when I was using an AMD dGPU w/ an AMD iGPU, but the AMD card (Vega64) had other issues w/ VFIO that necessitated running Xorg instead of Wayland. I guess since it now all works under Wayland I can just use that since it works on Wayland, but if it's useful to chase this down I can provide more information.

Wayland sysinfo: https://gist.github.com/jhnphm/d378f7601301736401c72c684f6c6e3d