ValveSoftware / steam-runtime

A runtime environment for Steam applications
Other
1.2k stars 86 forks source link

Proton 5.13-2 tries to use mesa/ANV before nVidia (Optimus laptops) #312

Closed austinried closed 3 years ago

austinried commented 3 years ago

Can't seem to get any games to run with the latest Proton 5.13-2, but the 5.0-10 version seems to work. I also am having trouble finding any specific error that's being thrown, they all just seem to throw up either a full white window or transparent window and then crash with a force quit/wait dialog shortly after.

Games tried: Sekiro, Risk of Rain 2, Gloomhaven. Sekiro loads an all white window, Risk of Rain 2 and Gloomhaven load up transparent windows, although Gloomhaven does change the cursor to their custom one oddly.

Any help is much appreciated.

steam-632360.log steam-780290.log steam-814380.log

Computer Information: Manufacturer: Unknown Model: Unknown Form Factor: Laptop No Touch Input Detected

Processor Information: CPU Vendor: GenuineIntel CPU Brand: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz CPU Family: 0x6 CPU Model: 0x9e CPU Stepping: 0xa CPU Type: 0x0 Speed: 4100 Mhz 12 logical processors 6 physical processors HyperThreading: Supported FCMOV: Supported SSE2: Supported SSE3: Supported SSSE3: Supported SSE4a: Unsupported SSE41: Supported SSE42: Supported AES: Supported AVX: Supported AVX2: Unsupported AVX512F: Unsupported AVX512PF: Unsupported AVX512ER: Unsupported AVX512CD: Unsupported AVX512VNNI: Unsupported SHA: Unsupported CMPXCHG16B: Supported LAHF/SAHF: Supported PrefetchW: Unsupported

Operating System Version: Ubuntu 20.04.1 LTS (64 bit) Kernel Name: Linux Kernel Version: 5.4.0-56-generic X Server Vendor: The X.Org Foundation X Server Release: 12008000 X Window Manager: GNOME Shell Steam Runtime Version: steam-runtime_0.20201005.0

Video Card: Driver: NVIDIA Corporation GeForce RTX 2080 with Max-Q Design/PCIe/SSE2 Driver Version: 4.6.0 NVIDIA 455.38 OpenGL Version: 4.6 Desktop Color Depth: 24 bits per pixel Monitor Refresh Rate: 143 Hz VendorID: 0x10de DeviceID: 0x1e90 Revision Not Detected Number of Monitors: 1 Number of Logical Video Cards: 2 Primary Display Resolution: 1920 x 1080 Desktop Resolution: 1920 x 1080 Primary Display Size: 13.54" x 7.64" (15.51" diag) 34.4cm x 19.4cm (39.4cm diag) Primary Bus: PCI Express 16x Primary VRAM: 8192 MB Supported MSAA Modes: 2x 4x 8x 16x

Sound card: Audio device: Realtek ALC298

Memory: RAM: 15910 Mb

VR Hardware: VR Headset: None detected

Miscellaneous: UI Language: English LANG: en_US.UTF-8 Total Hard Disk Space Available: 273437 Mb Largest Free Hard Disk Block: 202662 Mb

kisak-valve commented 3 years ago

Hello @austinried, let's treat this as a Pressure Vessel issue with Steam Linux Runtime - Soldier until there's a stronger indication that the issue is elsewhere.

Starting with Proton 5.13, Proton is run on top of the Steam Linux Runtime - Soldier container environment and that's setup by Pressure Vessel.

What I suspect is happening here is that the games are using the first Vulkan driver the Vulkan loader gives them and they end up getting handed the Intel driver first in the container environment, before the nVidia Vulkan driver. You should be able to test this hypothesis by temporarily disabling mesa/anv with something like sudo mv /usr/share/vulkan/icd.d/intel_icd.x86_64.json /usr/share/vulkan/icd.d/intel_icd.x86_64.json.disabled and/or sudo mv /usr/share/vulkan/icd.d/intel_icd.i686.json /usr/share/vulkan/icd.d/intel_icd.i686.json.disabled. That would leave the nVidia driver as the only working option for the game to pick up.

If that helps, it confirms that there's an issue with using mesa/anv when the system is configured to exclusively use nVidia's driver.

austinried commented 3 years ago

Yep that looks like the issue, after renaming both of those files all three games start up and run fine.

austinried commented 3 years ago

Couple updates on this problem:

I switched to on-demand mode today because of another issue where I wasn't getting 144hz display and instead getting 60hz, and that apparently fixes it, but noticed that even if I had launched steam using the dedicated graphics option, Hades was still getting very low framerates. Only renaming the intel driver files actually forces it to run on the nvidia hardware. Other games noted above still crash as well unless the intel files are renamed.

smcv commented 3 years ago

Please could you show us a log of what pressure-vessel is thinking, and exactly what happens? You can do this without involving Proton (which should make things a bit simpler) like this:

cd /path/to/SteamLinuxRuntime_soldier
PRESSURE_VESSEL_VERBOSE=1 ./run -- steam-runtime-system-info --verbose 2>&1 | tee container.log

and then send container.log as a gist. You can edit/censor the log if there's anything in it that you consider private, as long as it's obvious where it has been edited, for instance replacing your username with REDACTED.

The SteamLinuxRuntime_soldier directory will be in one of your Steam libraries. The most likely place is ~/.local/share/Steam/steamapps/common/SteamLinuxRuntime_soldier if you haven't reconfigured the installation path.

Also please show us the full system information (Help -> System Information in Steam), after waiting for the diagnostic tools to finish thinking about what drivers you have. Again, you can edit/censor it if you need to, and please send it as a gist.

austinried commented 3 years ago

Just to clarify @smcv does it matter if I gather these logs with the mesa/anv driver files disabled, and also would you like me to be in performance mode or on demand mode?

kisak-valve commented 3 years ago

We definitely would want the same environment as when you see the issue. Please re-enable mesa/ANV when testing. I'm not sure it matters between the two modes, since you saw an issue with both, just make it clear which you used.

austinried commented 3 years ago

Alright here you go, these were taken in performance mode with mesa/anv enabled:

Steam System Information: https://gist.github.com/austinried/dffd510b8e6d266bd840f4c606235f59 conatiner.log: https://gist.github.com/austinried/20b52de9208d0e2999546504a1a7d754

smcv commented 3 years ago

Steam System Information: https://gist.github.com/austinried/dffd510b8e6d266bd840f4c606235f59

This says an i386 vulkaninfo on the host is also trying to use Intel:

  "architectures" : {
    "i386-linux-gnu" : {
        "x11/vulkan" : {
...
          "renderer" : "Intel(R) UHD Graphics 630 (CFL GT2)",
          "version" : "1.2.131 (device 8086:3e9b) (driver 20.0.8)",

but an x86_64 vulkaninfo on the host gets NVIDIA, which I assume is what you wanted to happen:

    "x86_64-linux-gnu" : {
...
      "graphics-details" : {
        "x11/vulkan" : {
...
          "renderer" : "GeForce RTX 2080 with Max-Q Design",
          "version" : "1.2.142 (device 10de:1e90) (driver 455.152.0)"

I think this means that if you ran 32-bit Vulkan or DXVK games without using the container (e.g. Proton 5.0) they would use Intel, but if you ran 64-bit Vulkan or DXVK games without using the container, they would use NVIDIA. Does that match your experience?

OpenGL seems to be using the NVIDIA driver in both cases.

kisak-valve commented 3 years ago

The user @nuno1212s saw the same behavior with an AMD + nVidia setup at https://github.com/ValveSoftware/Proton/issues/3521#issuecomment-743925997.

FeralBytes commented 3 years ago

I would just like to note the fix above enabled Proton 5.13.3 to work on my NVIDIA Optimus Laptop while playing Ark Survival Evolved.

MEJacoby529 commented 3 years ago

I would just like to note the fix above enabled Proton 5.13.3 to work on my NVIDIA Optimus Laptop while playing Ark Survival Evolved.

Same for me, with Borderlands 3!

smcv commented 3 years ago

the fix above

Removing the .json files for the Intel driver is a workaround, rather than being a fix.

There was a bug in how the container set up Vulkan layers, which accidentally disabled the Mesa device selection layer (and possibly some NVIDIA layers, depending on precisely how they work). That bug might have been part of the root cause for selecting the wrong graphics card. The fix for that missed the boat for the current beta, but should be in the next beta.

Multi-GPU device selection in Vulkan is fairly complicated, so it's entirely possible that there is more than one root cause that triggers the same symptoms.

Guite commented 3 years ago

Yep that looks like the issue, after renaming both of those files all three games start up and run fine.

Same here.

smcv commented 3 years ago

If you revert the workaround involving renaming the intel_icd.*.json files, and use the beta branch of Steam Linux Runtime - soldier, does that help?

Switching to the beta branch is the same as switching to the beta branch of a game. Please follow the same procedure as https://support.steampowered.com/kb_article.php?ref=9847-WHXC-7326, but in the properties of Steam Linux Runtime - soldier rather than CS:GO. You can see which specific version you're using in SteamLinuxRuntime_soldier/VERSIONS.txt.

pressure-vessel version 0.20210114.0 fixed a bug that broke loading of Mesa's Vulkan device selection layer, which might have resulted in the Intel driver being selected before NVIDIA. The same bug might have affected NVIDIA Optimus-related Vulkan layers - it's not clear to me exactly how this is all meant to fit together, and there are a lot of moving parts involved.

If the new beta doesn't help, then as before, the information I described on https://github.com/ValveSoftware/steam-runtime/issues/312#issuecomment-741037038 (captured when not using any special workarounds, e.g. the Intel driver should be named intel_icd.*.json and not intel_icd.*.json.disabled) might help us to figure out what is happening here.

We have some diagnostic tool improvements in progress which might also help to figure this out, but those aren't ready yet.

Guite commented 3 years ago

Reverted the workaround and switched soldier runtime to client_beta. Things didn't work again then.

Here is the gist hoping it might be helpful: https://gist.github.com/Guite/f5c5acdd823880477ee80320a65dece6

smcv commented 3 years ago

Does it make any difference if you run Steam, and the debugging commands in https://github.com/ValveSoftware/steam-runtime/issues/312#issuecomment-741037038, with __NV_PRIME_RENDER_OFFLOAD=1 and __GLX_VENDOR_LIBRARY_NAME=nvidia in the environment?

Guite commented 3 years ago

Added a new revision to the gist. After a brief look at the diff this change seems relevant:

      "graphics-details" : {
        "x11/vulkan" : {
          "messages" : "ERROR: [Loader Message] Code 0 : /overrides/lib/i386-linux-gnu/vulkan/libvulkan_radeon.so: wrong ELF class: ELFCLASS32\nERROR: [Loader Message] Code 0 : /overrides/lib/i386-linux-gnu/vulkan/libvulkan_intel.so: wrong ELF class: ELFCLASS32\nINTEL-MESA: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0\n\nINTEL-MESA: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0\n\n",
          "renderer" : "Intel(R) UHD Graphics (CML GT2)",
          "version" : "1.2.145 (device 8086:9bc4) (driver 20.2.6)"
        },

became:

      "graphics-details" : {
        "x11/vulkan" : {
          "messages" : "ERROR: [Loader Message] Code 0 : /overrides/lib/i386-linux-gnu/vulkan/libvulkan_radeon.so: wrong ELF class: ELFCLASS32\nERROR: [Loader Message] Code 0 : /overrides/lib/i386-linux-gnu/vulkan/libvulkan_intel.so: wrong ELF class: ELFCLASS32\nINTEL-MESA: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0\n\nINTEL-MESA: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0\n\n",
          "renderer" : "Quadro RTX 3000",
          "version" : "1.2.155 (device 10de:1f36) (driver 460.128.192)"
        },
smcv commented 3 years ago

OK, great, so adding those two environment variables has "promoted" your NVIDIA card to be enumerated first. I hope you will find that running Steam with those same environment variables will result in rendering on the NVIDIA card.

Vulkan device selection on Linux is a bit unsolved in general. The Vulkan-Loader developers' position seems to be that the order of enumeration shouldn't matter, and Vulkan apps/games should be choosing the most suitable GPU from the list (by heuristics? by magic?) to use for rendering. Basically everyone else (including most games and the official vulkaninfo utility, which is what we use for diagnostics right now) normally assumes that the first GPU in the list of devices is the right one to use. Mesa and the proprietary NVIDIA driver both have Vulkan layers that try to resolve this by rearranging the list of GPUs so that the preferred one comes first. Mesa's uses heuristics that can be overridden by environment variables; NVIDIA's is proprietary so we can't see precisely what it does, but I think it relies on the environment variables that I mentioned.

A future version of the steam-runtime-system-info diagnostic tool will list and test all the available GPUs instead of just the first one, which will hopefully at least help to diagnose this.

in performance mode or on demand mode

What, precisely, does this mean? - what did you do to configure this, and do you know how it gets communicated to applications? pressure-vessel cannot pick up hints from the configuration of the host system unless we know what those hints are.

We try to preserve whatever order of GPU enumeration and whatever Vulkan layers we got from the host system, which in principle should mean that if Vulkan games choose the "right" GPU (whatever that means) outside the container, then they will also choose the "right" GPU inside the container - but it seems to be more complicated than that.

luca-s commented 3 years ago

@smcv I am adding a piece of information here, because I am affected by this issue as well.

in performance mode or on demand mode

This is a setting available to the user via nvidia configuration, available when the proprietary nvidia driver are installed and multiple gpus are available on the system. It allows the user to choose which gpu to use. How does this affect the system? And why the applications (e.g. Steam) are not aware of this choice? If I can help you debug what this setting changes at OS level, let me know.

image

image

Guite commented 3 years ago

Does it make any difference if you run Steam, and the debugging commands in #312 (comment), with __NV_PRIME_RENDER_OFFLOAD=1 and __GLX_VENDOR_LIBRARY_NAME=nvidia in the environment?

Starting steam from CLI using __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia steam does not work.

smcv commented 3 years ago

This is a setting available to the user via nvidia configuration, available when the proprietary nvidia driver are installed and multiple gpus are available on the system. It allows the user to choose which gpu to use. How does this affect the system?

I have no idea, and I don't have access to a dual-GPU machine that is new enough for Vulkan at the moment. Can anyone tell me how it affects the system?

luca-s commented 3 years ago

This is what Steam System Information reports when I select:

smcv commented 3 years ago

This is what Steam System Information reports

Thanks, this sort of information helps a lot. I still can't see why this works, or what it is that mechanically changed on your host system when you switched between modes, but ... apparently it is mostly doing the right things?

In NVIDIA-only (performance) mode, it looks like we are mostly correctly seeing the NVIDIA stack, both inside and outside the container. One exception is that VDPAU works outside the container but not inside, for some reason.

In Intel-only (power saving) mode, it looks like we are correctly seeing the Intel stack. VDPAU and VA-API don't work, but those are usually non-essential, and as far as I understand it, VDPAU on non-NVIDIA hardware isn't really expected to work reliably anyway.

In on-demand mode, we're trying to use the Intel GPU, because that's the default in on-demand mode, and nobody has told the graphics stack that it ought to behave otherwise: if you have put the system in an on-demand mode where Intel is the default, then you can't expect the NVIDIA device to be used unless someone somehow asks for it to be used. I think the official way to ask for it is __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia as per NVIDIA's documentation, but I could be wrong. DRI_PRIME=1 might also be part of the puzzle.

In the on-demand mode, libVkLayer_MESA_device_select.so is missing in the container, because you are not using the beta version of soldier, and the non-beta version still has the bug I mentioned in https://github.com/ValveSoftware/steam-runtime/issues/312#issuecomment-763045639. Also, attempting to use Vulkan is crashing with a segmentation fault, but that isn't a bug in pressure-vessel - it's also crashing when not running in the container. So we're faithfully reproducing the (non-working) behaviour of the non-containerized system, which is about the best we can expect to be able to do :-)

Guite commented 3 years ago

In NVIDIA-only (performance) mode, it looks like we are mostly correctly seeing the NVIDIA stack, both inside and outside the container.

I doubt that. Because I am also using the NVIDIA-only mode and things don't work ootb unless I apply the workaround (renaming the files) from above.

smcv commented 3 years ago

@Guite wrote:

Starting steam from CLI using __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia steam [with beta 0.20210114.x] does not work.

What does "does not work" mean, precisely? Please show me a log https://github.com/ValveSoftware/steam-runtime/blob/master/doc/reporting-steamlinuxruntime-bugs.md#essential-information?

Note that for the environment variables to take effect, you need to completely exit from Steam (no Steam icon in the system tray, etc.) and start it with the environment variables set.

smcv commented 3 years ago

Added a new revision to the gist. After a brief look at the diff this change seems relevant:

      "graphics-details" : {
        "x11/vulkan" : {
          ...
          "renderer" : "Intel(R) UHD Graphics (CML GT2)",
          "version" : "1.2.145 (device 8086:9bc4) (driver 20.2.6)"
        },

became:

      "graphics-details" : {
        "x11/vulkan" : {
          ...
          "renderer" : "Quadro RTX 3000",

This (and the Nvidia (Performance Mode) system info dump) says that when a Vulkan client inside the container enumerates physical devices, the one that is enumerated first has changed from the Intel iGPU to the NVIDIA dGPU. In steam-runtime-system-info's case, that Vulkan client is our copy of the vulkaninfo tool, or our check-vulkan helper (which is just a simple test that opens a hidden/off-screen X11 window and tries to draw a triangle into it).

If that isn't enough to make DXVK games run on the NVIDIA dGPU in preference to the Intel iGPU, then that's getting into questions of how DXVK chooses which GPU to run on, rather than how pressure-vessel imports stuff into the container.

Guite commented 3 years ago

Please show me a log

Here is the log from running __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia STEAM_LINUX_RUNTIME_LOG=1 steam and starting a proton game.

slr-app352400-s773d6ca723abacef.log

kisak-valve commented 3 years ago

As far as I understand it, nVidia's performance mode should be the same thing as "Offloading Graphics Display with RandR 1.4" and sometimes called "nvidia-prime" which has the entire X session running on the nVidia chipset as described in https://download.nvidia.com/XFree86/Linux-x86_64/340.108/README/randr14.html. I was under the impression that mesa/ANV will fail if anything tries to use it due to the X session being run on the nVidia chipset. Some mechanism should make the nVidia driver get picked first in this case, but I don't know how that happens exactly,

The on-demand mode is the newer PRIME render offload (https://download.nvidia.com/XFree86/Linux-x86_64/435.17/README/primerenderoffload.html), which has the X session running on the integrated chipset and the nVidia chipset in a ready/standby state all the time. The VK_LAYER_NV_optimus vulkan layer should be involved here whenever __NV_PRIME_RENDER_OFFLOAD=1 is set.

luca-s commented 3 years ago

In NVIDIA-only (performance) mode, it looks like we are mostly correctly seeing the NVIDIA stack, both inside and outside the container.

I doubt that. Because I am also using the NVIDIA-only mode and things don't work ootb unless I apply the workaround (renaming the files) from above.

Same here. I always run Steam in NVIDIA-only (performance mode) and the issue is there (only with proton 5.13-.*, that is games that used to work in prior proton versions do not work anymore).

Toomoch commented 3 years ago

This is still an issue. Is there another workaround?

smcv commented 3 years ago

Please could someone who is experiencing this problem re-test with the current stable version of Steam Linux Runtime - soldier (you should see 0.20210208.0 in VERSIONS.txt), and provide all of the information requested in https://github.com/ValveSoftware/steam-runtime/blob/master/doc/reporting-steamlinuxruntime-bugs.md#essential-information using that version?

I don't expect 0.20210208.0 to solve this, but what it will do is give us more information about your Vulkan GPUs, drivers and influential environment variables: recent versions of the diagnostic tools include more information about these. Getting a complete set of information from the same person at the same time makes it a lot more likely that we can identify what is going wrong and fix it.

Trying with the client_beta branch would also provide useful information, although the client_beta branch is known to trigger a regression for users of some versions of Mesa (this is believed to be either a Mesa bug or a Vulkan-Loader bug, so it is not something that we can easily fix in the Steam Runtime).

Is there another workaround?

No, but if you can give us the clue we need, then there might be a solution.

luca-s commented 3 years ago

@smcv thank you for the update on this issue.

Here is my system information. I noticed that the system information is reported differently when I run steam with STEAM_LINUX_RUNTIME_LOG=1, so here is also the system information when steam is run with STEAM_LINUX_RUNTIME_LOG=1.

Here are two games logs that fail to run with proton 5.13-6 but work well with Proton 5.0-10:

I have other games that show the same problem if you need more logs.

Leopard1907 commented 3 years ago

@luca-s Fwiw, Soldier Runtime Beta fixed the issue on my end.

https://github.com/ValveSoftware/steam-runtime/issues/295#issuecomment-788224482

Leopard1907 commented 3 years ago

I tested a bunch of games (32 bit, 64 bit, native VLK/GL, DXVK) and i can say it fully works now.

Guite commented 3 years ago

Here is my log of using version 0.20210217.0.

slr-app352400-se782a448f7e0171f.log

RyuzakiKK commented 3 years ago

@luca-s I just wanted to point out that from the system information that you posted I can see that inside the container the GPU that gets chosen is exactly the same as in outside the container.

In the "LD_* scout runtime" information: (no container) for i386-linux-gnu we have GeForce GTX 950M and in x86_64-linux-gnu we have Intel(R) HD Graphics 630 (KBL GT2).

This same thing happens also inside the Soldier container, with the only exception that apparently we are not actually able to use the Intel integrated GPU.

Leopard1907 commented 3 years ago

with the only exception that apparently we are not actually able to use the Intel integrated GPU.

Issue claims exact opposite.

Also it is worthy to mention, DXVK was picking up dgpu itself iirc. That behaviour was also broken when soldier got introduced. So if everything is back to how it was before Soldier now ( on Soldier Beta ) that might be why you are not able to utilize igpu.

luca-s commented 3 years ago

@RyuzakiKK thank you for the information, however I don't have the knowledge to understand the implication of what you are saying.

@Leopard1907 I would like to try the Soldier Runtime Beta since that fixed the issue on your case, but I don't know how to install it. Could you point me to the documentation on how to install/enable the beta?

RyuzakiKK commented 3 years ago

@luca-s to switch to the beta version of Soldier you can follow these instructions https://github.com/ValveSoftware/steam-runtime/issues/338#issuecomment-786077659

luca-s commented 3 years ago

@RyuzakiKK thank you! I tried both Resident Evil 7 and Dark Souls Remastered with Runtime Soldier Beta (0.20210217.0) but they didn't work unfortunately.

smcv commented 3 years ago

Also it is worthy to mention, DXVK was picking up dgpu itself iirc

Do you know what information it bases this on, or have a reference for what it's doing?

My understanding was that DXVK exposes each GPU provided by the Vulkan stack as a DirectX GPU, possibly in the same order that the Vulkan stack provides them or possibly with some reshuffling (which might be what you want, or not); and then the DirectX game is free to either run on the first GPU in the list, or make its own choice (which might also be not what you want).

Meanwhile, the order in which the Vulkan stack provides them is affected by various Vulkan layers (one for Mesa, one for NVIDIA proprietary drivers, at least one for AMD's two non-Mesa drivers) which each jump in and change the order of enumeration, in an order that is, itself, undefined (so if you have the Mesa device selection layer and also the NVIDIA layer, it isn't obvious which one will "win" and get its decision used).

This is far more complex than it ought to be, and I'm hoping that in the long term it'll get solved in a generic way in Vulkan-Loader - but until then, we're doing our best. We try to make sure that whatever order pressure-vessel sees GPUs in, that order gets preserved inside the container, in the hope that the order used outside the container is the order you wanted.

One approach to this would be to throw in yet another Vulkan layer to try to undo the reordering that was done by the other Vulkan layers, but that seems like it would just be adding to the problem.

smcv commented 3 years ago

From the info on #295, I think what is happening in the multi-GPU use case on at least @Leopard1907's system is:

Previously, Vulkan-Loader didn't reliably load all the layers inside the container, because the way we have to translate the layers' manifests to be usable inside the container triggered Vulkan-Loader#155. In particular, on NVIDIA systems, one of the affected layers was the VK_LAYER_NV_optimus layer, which is activated by __NV_PRIME_RENDER_OFFLOAD=1 and is responsible for processing environment variables like __VK_LAYER_NV_optimus=NVIDIA_only.

Now that we have Vulkan-Loader 1.2.169 available inside the container (currently only in the beta version), Vulkan-Loader#155 is fixed, the VK_LAYER_NV_optimus layer loads, and GPUs get enumerated in the order you expect.

This will not be an immediate solution for everyone:

but it's certainly a step in the right direction.

The bug I mentioned in https://github.com/ValveSoftware/steam-runtime/issues/295#issuecomment-783518212 makes this more complicated. If you have Mesa 20.3.4 or newer, please make sure that either your Mesa has a backport of commit 38ce8d4d as mentioned in https://github.com/ValveSoftware/steam-runtime/issues/295#issuecomment-784025157 (this is hopefully going to be included in Mesa 20.3.5 and 21.0.0), or MangoHUD is disabled.

smcv commented 3 years ago

I noticed that the system information is reported differently when I run steam with STEAM_LINUX_RUNTIME_LOG=1

That's a bug, which should be fixed in the beta (previously it was redirecting too much information to the log, including the text that should have ended up in the System Information window).

smcv commented 3 years ago

@luca-s, please could you repeat what you did in https://github.com/ValveSoftware/steam-runtime/issues/312#issuecomment-763547568 while still using the beta? That might help us to understand the mechanics of how this switching mechanism actually works - the beta's diagnostic tool gives us a bit more information than the version you were using then.

I don't have access to a dual-GPU machine, so I'm completely relying on information from people like you to be able to find a solution to this.

smcv commented 3 years ago

@luca-s, it would also be useful if you could try this in each of your three scenarios (NVIDIA-only, Intel-only, NVIDIA-on-demand):

/home/luca/.steam/ubuntu12_32/steam-runtime/run.sh \
/home/luca/.steam/steam/steamapps/common/SteamLinuxRuntime_soldier/pressure-vessel/bin/steam-runtime-system-info > srsi.txt

and provide the resulting srsi.txt for each. This will give the same information as the non-container part of the System Information window, but with a newer version of the diagnostic tool, which will hopefully tell us more about your Vulkan stack.

Leopard1907 commented 3 years ago

Do you know what information it bases this on, or have a reference for what it's doing?

https://github.com/doitsujin/dxvk/blob/master/src/dxvk/dxvk_instance.cpp#L170

smcv commented 3 years ago

https://github.com/doitsujin/dxvk/blob/master/src/dxvk/dxvk_instance.cpp#L170

Thanks. It looks like this should be ordering all discrete GPUs before all integrated GPUs, whatever we do - so the only thing that can affect the choice of GPU should be mechanisms like __VK_LAYER_NV_optimus=NVIDIA_only that completely drop certain GPUs from the list.

This is also using std::sort and not std::stable_sort, so if there is more than one discrete GPU or more than one integrated GPU, regardless of how careful we are to preserve order, it will randomly permute them. That's probably not what was intended!

The equivalent code in Proton's dxvk branch is a bit different: https://github.com/ValveSoftware/dxvk/blob/3f91cdbc126abde7b2334e739d08de0ef2edd1d2/src/dxvk/dxvk_instance.cpp#L170

Leopard1907 commented 3 years ago

so the only thing that can affect the choice of GPU should be mechanisms like __VK_LAYER_NV_optimus=NVIDIA_only that completely drop certain GPUs from the list.

DXVK device filter can be used for that too.

https://github.com/doitsujin/dxvk#device-filter

At least on my use case i also pass __VK_LAYER_NV_optimus=NVIDIA_only __GLX_VENDOR_LIBRARY_NAME=nvidia due to games like Doom 2016 doesn't offer a device selection mechanism.On their native environment (Windows) that is not an issue because driver packages have mechanisms that detects executables by name and just defaults to using dgpu. If such predefined method lacks mentioned driver packages ( namely control panels of those driver suits) offers a manual executable addition by user to define which gpu will be used.

Edit: Also it should be noted while rare there are systems/configs as called Reverse Prime also. https://wiki.archlinux.org/index.php/PRIME#Reverse_PRIME

I never had one though.

Leopard1907 commented 3 years ago

This will not be an immediate solution for everyone:

not everyone sets those environment variables

Definitely. I think that whole mess can be solved with an option to define gpu ( a simple switch that passes needed vars, like Lutris has,does for both Optimus and complete Mesa DRI_PRIME systems ) within Steam client itself. Because when vars are not passed when booting Steam client, it will compile and download shaders ( Fossilize) for igpu itself which in many cases that is not what users want to use for gaming. Which is a waste and basically making a good feature obsolete.

luca-s commented 3 years ago

@smcv here they are!

Nvidia-only:

Intel-only:

NVIDIA-on-demand: