ValveSoftware / steam-runtime

A runtime environment for Steam applications
Other
1.18k stars 86 forks source link

./steamwebhelper: error while loading shared libraries: libgallium.so with mesa >= 24.2.0-rc1 #683

Closed tgurr closed 1 month ago

tgurr commented 2 months ago

Your system information

Please describe your issue in as much detail as possible:

The Steam client is not starting anymore and shows the following error message: image

From command line output:

src/steamUI/steamuisharedjscontroller.cpp (619) : Failed creating offscreen shared JS context
src/steamUI/steamuisharedjscontroller.cpp (619) : Failed creating offscreen shared JS context

Command line output: https://gist.github.com/tgurr/f293c2a6887054ed914414b1da9206a0

Steps for reproducing this issue:

  1. Update to mesa 24.2.0-rc1 or current git master
  2. Try to start steam

Additional information

mesa bugreport: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11544

There are three new files in the dri directory:

libgallium_drv_video.so
libgallium.so
libdril_dri.so

and the "old" ones now seem to be just symlinks. Maybe that causes issues for the container stuff and how it's handled by pressure vessel (for Exherbo)?

mesa-24.1.4:

$ ls -la /usr/i686-pc-linux-gnu/lib/dri/
insgesamt 108008
drwxr-xr-x  2 root root     4096 22. Jul 20:11 .
drwxr-xr-x 17 root root    20480 22. Jul 20:11 ..
-rwxr-xr-x  4 root root 21196688 22. Jul 20:11 kms_swrast_dri.so
-rwxr-xr-x  4 root root 21196688 22. Jul 20:11 r600_dri.so
-rwxr-xr-x  2 root root 12893776 22. Jul 20:11 r600_drv_video.so
-rwxr-xr-x  4 root root 21196688 22. Jul 20:11 radeonsi_dri.so
-rwxr-xr-x  2 root root 12893776 22. Jul 20:11 radeonsi_drv_video.so
-rwxr-xr-x  4 root root 21196688 22. Jul 20:11 swrast_dri.so
$ ls -la /usr/x86_64-pc-linux-gnu/lib/dri/
insgesamt 102376
drwxr-xr-x   2 root root     4096 22. Jul 20:20 .
drwxr-xr-x 103 root root   204800 22. Jul 20:20 ..
-rwxr-xr-x   4 root root 19992776 22. Jul 20:20 kms_swrast_dri.so
-rwxr-xr-x   4 root root 19992776 22. Jul 20:20 r600_dri.so
-rwxr-xr-x   2 root root 12313512 22. Jul 20:20 r600_drv_video.so
-rwxr-xr-x   4 root root 19992776 22. Jul 20:20 radeonsi_dri.so
-rwxr-xr-x   2 root root 12313512 22. Jul 20:20 radeonsi_drv_video.so
-rwxr-xr-x   4 root root 19992776 22. Jul 20:20 swrast_dri.so

mesa-24.2.0-rc1 & git main:

$ ls -la /usr/i686-pc-linux-gnu/lib/dri/
insgesamt 34920
drwxr-xr-x  2 root root     4096 22. Jul 20:03 .
drwxr-xr-x 17 root root    20480 22. Jul 20:03 ..
lrwxrwxrwx  1 root root       14 22. Jul 20:03 kms_swrast_dri.so -> libdril_dri.so
-rwxr-xr-x  1 root root    79328 22. Jul 20:03 libdril_dri.so
-rwxr-xr-x  1 root root 13673712 22. Jul 20:03 libgallium_drv_video.so
-rwxr-xr-x  1 root root 21972528 22. Jul 20:03 libgallium.so
lrwxrwxrwx  1 root root       14 22. Jul 20:03 r600_dri.so -> libdril_dri.so
lrwxrwxrwx  1 root root       23 22. Jul 20:03 r600_drv_video.so -> libgallium_drv_video.so
lrwxrwxrwx  1 root root       14 22. Jul 20:03 radeonsi_dri.so -> libdril_dri.so
lrwxrwxrwx  1 root root       23 22. Jul 20:03 radeonsi_drv_video.so -> libgallium_drv_video.so
lrwxrwxrwx  1 root root       14 22. Jul 20:03 swrast_dri.so -> libdril_dri.so
$ ls -la /usr/x86_64-pc-linux-gnu/lib/dri/
insgesamt 33388
drwxr-xr-x   2 root root     4096 22. Jul 20:05 .
drwxr-xr-x 103 root root   204800 22. Jul 20:05 ..
lrwxrwxrwx   1 root root       14 22. Jul 20:05 kms_swrast_dri.so -> libdril_dri.so
-rwxr-xr-x   1 root root   100600 22. Jul 20:05 libdril_dri.so
-rwxr-xr-x   1 root root 13073728 22. Jul 20:05 libgallium_drv_video.so
-rwxr-xr-x   1 root root 20798048 22. Jul 20:05 libgallium.so
lrwxrwxrwx   1 root root       14 22. Jul 20:05 r600_dri.so -> libdril_dri.so
lrwxrwxrwx   1 root root       23 22. Jul 20:05 r600_drv_video.so -> libgallium_drv_video.so
lrwxrwxrwx   1 root root       14 22. Jul 20:05 radeonsi_dri.so -> libdril_dri.so
lrwxrwxrwx   1 root root       23 22. Jul 20:05 radeonsi_drv_video.so -> libgallium_drv_video.so
lrwxrwxrwx   1 root root       14 22. Jul 20:05 swrast_dri.so -> libdril_dri.so

Note that for mesa 24.2.0-rc1 there's a spurious error message popping up MESA-LOADER: failed to open radeonsi: driver not built!) which is fixed in git main with https://gitlab.freedesktop.org/mesa/mesa/-/commit/159a3edd80a988dec263708f851ed35eec881a78 applying that patch to 24.2.0-rc1 however didn't change the outcome, I first thought maybe the error message confuses steam in any way but apparently the issue is not that easy to solve.

Also not sure if relevant or not in this case but a smiliar change happened for the file in the vdpau directory:

mesa-24.1.4:

$ ls -la /usr/x86_64-pc-linux-gnu/lib/vdpau/
insgesamt 24232
drwxr-xr-x   2 root root     4096 22. Jul 20:53 .
drwxr-xr-x 103 root root   204800 22. Jul 20:53 ..
lrwxrwxrwx   1 root root       22 22. Jul 20:53 libvdpau_r600.so -> libvdpau_r600.so.1.0.0
lrwxrwxrwx   1 root root       22 22. Jul 20:53 libvdpau_r600.so.1 -> libvdpau_r600.so.1.0.0
lrwxrwxrwx   1 root root       22 22. Jul 20:53 libvdpau_r600.so.1.0 -> libvdpau_r600.so.1.0.0
-rwxr-xr-x   2 root root 12264296 22. Jul 20:53 libvdpau_r600.so.1.0.0
lrwxrwxrwx   1 root root       26 22. Jul 20:53 libvdpau_radeonsi.so -> libvdpau_radeonsi.so.1.0.0
lrwxrwxrwx   1 root root       26 22. Jul 20:53 libvdpau_radeonsi.so.1 -> libvdpau_radeonsi.so.1.0.0
lrwxrwxrwx   1 root root       26 22. Jul 20:53 libvdpau_radeonsi.so.1.0 -> libvdpau_radeonsi.so.1.0.0
-rwxr-xr-x   2 root root 12264296 22. Jul 20:53 libvdpau_radeonsi.so.1.0.0
lrwxrwxrwx   1 root root       19 19. Jul 22:29 libvdpau_trace.so -> libvdpau_trace.so.1
lrwxrwxrwx   1 root root       23 19. Jul 22:29 libvdpau_trace.so.1 -> libvdpau_trace.so.1.0.0
-rwxr-xr-x   1 root root    63576 19. Jul 22:29 libvdpau_trace.so.1.0.0

mesa-24.2.0-rc1 & git main:

$ ls -la /usr/x86_64-pc-linux-gnu/lib/vdpau/
insgesamt 12988
drwxr-xr-x   2 root root     4096 22. Jul 20:27 .
drwxr-xr-x 103 root root   204800 22. Jul 20:27 ..
-rwxr-xr-x   1 root root 13020416 22. Jul 20:27 libvdpau_gallium.so.1.0.0
lrwxrwxrwx   1 root root       22 22. Jul 20:27 libvdpau_r600.so -> libvdpau_r600.so.1.0.0
lrwxrwxrwx   1 root root       22 22. Jul 20:27 libvdpau_r600.so.1 -> libvdpau_r600.so.1.0.0
lrwxrwxrwx   1 root root       22 22. Jul 20:27 libvdpau_r600.so.1.0 -> libvdpau_r600.so.1.0.0
lrwxrwxrwx   1 root root       25 22. Jul 20:27 libvdpau_r600.so.1.0.0 -> libvdpau_gallium.so.1.0.0
lrwxrwxrwx   1 root root       26 22. Jul 20:27 libvdpau_radeonsi.so -> libvdpau_radeonsi.so.1.0.0
lrwxrwxrwx   1 root root       26 22. Jul 20:27 libvdpau_radeonsi.so.1 -> libvdpau_radeonsi.so.1.0.0
lrwxrwxrwx   1 root root       26 22. Jul 20:27 libvdpau_radeonsi.so.1.0 -> libvdpau_radeonsi.so.1.0.0
lrwxrwxrwx   1 root root       25 22. Jul 20:27 libvdpau_radeonsi.so.1.0.0 -> libvdpau_gallium.so.1.0.0
lrwxrwxrwx   1 root root       19 19. Jul 22:29 libvdpau_trace.so -> libvdpau_trace.so.1
lrwxrwxrwx   1 root root       23 19. Jul 22:29 libvdpau_trace.so.1 -> libvdpau_trace.so.1.0.0
-rwxr-xr-x   1 root root    63576 19. Jul 22:29 libvdpau_trace.so.1.0.0
kisak-valve commented 2 months ago

Hello @tgurr, this reads more like a Pressure Vessel issue than an issue with the Steam client, so I've transferred this issue report to the steam-runtime issue tracker. Please give https://github.com/ValveSoftware/steam-runtime/blob/master/doc/reporting-steamlinuxruntime-bugs.md#essential-information a read and share the requested information.

tgurr commented 2 months ago

Trying the steps from the mentioned link I couldn't get anything useful out STEAM_LINUX_RUNTIME_VERBOSE=1 steam 2>&1 | tee ~/slr.log contains next to nothing: slr.log

Launching PRESSURE_VESSEL_VERBOSE=1 STEAM_LINUX_RUNTIME_VERBOSE=1 steam command line output: https://gist.github.com/tgurr/e6daaf6f0b4a2dd3130189f2aefacf05

Additional stuff: 01_pinned_libs.txt 02_print-steam-runtime-library-paths.txt 03_library-abi.log 04_srsi.log

smcv commented 2 months ago

this reads more like a Pressure Vessel issue than an issue with the Steam client

I'm not convinced, actually - the original report sounded more like a steamwebhelper issue to me.

No, on closer inspection, this is pressure-vessel-related.

smcv commented 2 months ago

STEAM_LINUX_RUNTIME_VERBOSE=1 steam 2>&1 | tee ~/slr.log contains next to nothing

This is because anything logged by the steamwebhelper (which is known to be rather verbose, and sometimes misleading) gets redirected to a separate log file.

When investigating any steamwebhelper crash, please check the log files in ~/.steam/steam/logs/ (or ~/.var/app/com.steampowered.Steam/.steam/steam/logs if you're using Steam via the unofficial Flatpak app, or ~/snap/steam/common/.steam/steam/logs if you're using the unofficial Snap app).

In the current public beta version of Steam, the output of SLR while running steamwebhelper appears in webhelper-linux.txt, and there are potentialy also relevant messages in cef_log.txt and webhelper.txt. Older versions might have logged to steamwebhelper.log but I think that file is unused now. You might want to exit from Steam completely and move your ~/.steam/steam/logs/ out of the way, so that you can know that everything in that directory is new.

Running Steam as STEAM_LINUX_RUNTIME_VERBOSE=1 steam is a correct debugging step: that should result in pressure-vessel debug-level output appearing in webhelper-linux.txt.

There are three new files in the dri directory and the "old" ones now seem to be just symlinks. Maybe that causes issues for the container stuff and how it's handled by pressure vessel (for Exherbo)?

It should be able to dereference the symlinks, but we'd have to see the detailed log to know for sure whether that's working as intended.

smcv commented 2 months ago

src/steamUI/steamuisharedjscontroller.cpp (619) : Failed creating offscreen shared JS context

Unfortunately, I think this might just mean "steamwebhelper is broken". We'd need to see the messages logged in webhelper-linux.txt to know whether this is a problem with SLR or with steamwebhelper.

Some of the messages logged by the steamwebhelper are known to be misleading: it seems to be normal to get ANGLE and EGL initialization errors, even on an otherwise working system.

smcv commented 2 months ago

Aha! The original issue report has a webhelper-linux.txt, which doesn't have verbose SLR output, but does have relevant error messages.

There are lots of warnings like this:

x86_64-linux-gnu-capsule-capture-libs: warning: Dependencies of libGLX_mesa.so.0 not found, ignoring: Missing dependencies: Could not find "libgallium.so" in LD_LIBRARY_PATH "/home/tgurr/.local/share/Steam/ubuntu12_32:/home/tgurr/.local/share/Steam/ubuntu12_32/panorama:/usr/x86_64-pc-linux-gnu/lib:/usr/local/lib:/usr/x86_64-pc-linux-gnu/lib/nss:/usr/x86_64-pc-linux-gnu/lib/qt5:/usr/x86_64-pc-linux-gnu/lib/qt6:/usr/i686-pc-linux-gnu/lib:/usr/local/lib", ld.so.cache, DT_RUNPATH or fallback /lib:/usr/lib

and then the steamwebhelper doesn't start either:

./steamwebhelper: error while loading shared libraries: libgallium.so: cannot open shared object file: No such file or directory

which is probably the root cause for what you're seeing.

We'll need to figure out how your libGLX_mesa.so.0 is loading libgallium.so successfully on your host system, but not when the SLR container infrastructure tries to find it. @tgurr, please inspect one of the affected libraries like libGLX_mesa.so.0 with a command like objdump -T -x /usr/x86_64-pc-linux-gnu/lib/libGLX_mesa.so.0, and copy/paste the section of the output headed Dynamic Section: here?

@tgurr or @kisak-valve, it would maybe be helpful to retitle this issue to mention ./steamwebhelper: error while loading shared libraries: libgallium.so.

smcv commented 2 months ago

It would also be interesting if you could try running:

LD_DEBUG=libs,files glxgears 2>&1 | tee glxgears.log

and provide glxgears.log as an attachment or Gist, so that we can see how a simple OpenGL program like glxgears manages to find its libraries on this particular system.

tgurr commented 2 months ago

As usual huge thanks! I'll provide the requested information later on once I got back from work. From what I understood also from the conversation in the mesa bugtracker we could maybe also install an environment file to extend the LDPATH for the non-default search path (${libdir}/dri) with:

/etc/env.d/99mesa containing LDPATH=/usr/@TARGET@/lib/dri

and installing that with our mesa package on a distribution level? Can't judge if that would be a proper solution or rather a workaround though. Granted it would actually work in the first place.

smcv commented 2 months ago

There is no such thing as LDPATH (that I'm aware of), do you mean LD_LIBRARY_PATH?

My current understanding of the situation (which could be wrong, I'll need to see debug information) is that if everything is working correctly, you should not need to set LD_LIBRARY_PATH, because the Mesa-related libraries should be able to find libgallium.so as referenced by their DT_RUNPATH ELF headers; and adding /usr/@TARGET@/lib/dri to the LD_LIBRARY_PATH might even be harmful, by making components load mismatched versions of their dependencies. So I would prefer it if Exherbo doesn't need to do that.

smcv commented 2 months ago

A build log from the way Exherbo builds Mesa would be useful information, if you can easily obtain it.

tgurr commented 2 months ago

objdump -T -x /usr/x86_64-pc-linux-gnu/lib/libGLX_mesa.so.0

https://gist.github.com/tgurr/c4d9877af69ccbe3c46889ec5bab7165

LD_DEBUG=libs,files glxgears 2>&1 | tee glxgears.log

https://gist.github.com/tgurr/7c610adfbfbbb6ae8031f6965ce69321

A build log from the way Exherbo builds Mesa would be useful information, if you can easily obtain it.

x86_64: https://gist.githubusercontent.com/tgurr/73a69c6414dc424517d3eb0693ffffe1/raw/8046bf3b420e6dd455783f93d3af5f9f1adf0d33/gistfile1.txt x86: https://gist.githubusercontent.com/tgurr/c07653ad03c2be9c253876b6da90bb68/raw/0bc242ce0c7b4875aa1236bb494a80df8ef2bb7d/gistfile1.txt

Please let me know if I missed something and/or if you need further details I'm able to provide.

smcv commented 2 months ago

objdump -T -x /usr/x86_64-pc-linux-gnu/lib/libGLX_mesa.so.0

https://gist.github.com/tgurr/c4d9877af69ccbe3c46889ec5bab7165

Oh no. I can see why this is not working:

Dynamic Section:
...
  NEEDED               libgallium.so
...
  RPATH                /usr/x86_64-pc-linux-gnu/lib/dri

That's the legacy DT_RPATH, not the more modern DT_RUNPATH (see ld.so(8) for what the difference is).

libcapsule (and therefore pressure-vessel, and therefore SLR) doesn't support DT_RPATH, only DT_RUNPATH. This limitation is because the semantics of DT_RPATH are really annoying to implement (it has an "action at a distance" behaviour that affects the entire dependency tree).

smcv commented 2 months ago

Does Exherbo do something in its toolchain to avoid DT_RUNPATH and go back to the older DT_RPATH?

In most distros (e.g. Debian), linking with Meson install_rpath results in linking with -Wl,-rpath,... which actually generates a DT_RUNPATH, unless the linker flags also include -Wl,--disable-new-dtags.

I don't see an explicit --disable-new-dtags or --enable-new-dtags in your build log, so presumably you're getting your linker's default behaviour.

Could this perhaps be because Debian configures binutils with ./configure --enable-new-dtags (and so do other distros like Fedora and Arch), but Exherbo does not?

tgurr commented 2 months ago

Checked our binutils and we (yet) don't pass any specific--disable-new-dtags or --enable-new-dtags to it so it uses the defaults of current 2.42. After adding it and recompiling binutils and mesa steam indeed works again!

I'll have to check if we can simply add the --enable-new-dtags to our binutils and will for now try to explicitly pass it to our mesa package.

smcv commented 2 months ago

The possible routes to get a DT_RUNPATH would be:

  1. Rebuild binutils with ./configure --enable-new-dtags, like Debian/Fedora/Arch do; and then use that binutils to recompile Mesa
  2. Or, recompile Mesa with -Wl,--enable-new-dtags added to the linker flags (LDFLAGS or Meson c_link_args or equivalent)

It's a bit confusing - there are two options with the same name and basically the same effect, in two different places.

smcv commented 2 months ago

There are a couple of reasons why most other distributions have moved to DT_RUNPATH ("new dtags") and away from DT_RPATH.

DT_RPATH is higher-precedence than $LD_LIBRARY_PATH and causes $LD_LIBRARY_PATH to be ignored, which often comes as a surprise to users and developers who are trying to use $LD_LIBRARY_PATH to substitute a newer version of a library, and similarly can be harmful to portability frameworks like the (older LD_LIBRARY_PATH-based version of the) Steam Runtime.

DT_RPATH also has the "action at a distance" semantics that I mentioned earlier, where the DT_RPATH on the main executable or on library A can affect the search path that's used for the dependencies of library B, which is not always what's wanted.

tgurr commented 2 months ago

I've implemented a package-specific workaround for our mesa >= 24.2.0-rc1: https://gitlab.exherbo.org/exherbo/x11/-/merge_requests/858 and proposed to change our binutils defaults moving away from legacy stuff (https://gitlab.exherbo.org/exherbo/arbor/-/merge_requests/4119), probably since it's not the default of binutils even of recent versions noone looked into this yet and it may be just an oversight we didn't move on with this as well.

Again I can't tell you how grateful I am for your help and willingness to do so, I'm pretty sure I wouldn't have been able to figure this out on my own and if things led to improving the distribution as a whole even better. You're really a person to rely on and you've always been helpful to figure out stuff and get things going in the first place and times like these where things break. Thanks for taking the time and jumping in to provide such a support that can not be taken for granted. I hope the other things you've mentioned in the mesa bugreport which kind of started a discussion will also result in further improvements for everyone.

Again thank you from the deepest of my heart!

smcv commented 2 months ago

In the short term, https://gitlab.exherbo.org/exherbo/x11/-/merge_requests/858 looks like a good solution for Exherbo. If there are other distributions where binutils still defaults to --disable-new-dtags, then an equivalent issue would exist in those distributions if they upgrade their Mesa to 24.2.0-rc1, and a change equivalent to !858 would be an equally good short-term solution for those other distros.

In the medium term, if https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30328 gets merged before the 24.2.0 stable release (preferably before 24.2.0-rc2), I believe it would avoid this issue completely.

As a long-term improvement, https://gitlab.exherbo.org/exherbo/arbor/-/merge_requests/4119 looks like a positive change, and I would encourage non-Exherbo distributions to do similarly if they haven't already.

I know that at least Arch, Debian/Ubuntu, Fedora and Gentoo default to --enable-new-dtags already. RHEL might still default to --disable-new-dtags if I'm reading its specfile correctly (but I didn't look very closely). For other distros (e.g. openSUSE) I don't know the situation, but I suspect that many of them default to --enable-new-dtags.

tgurr commented 2 months ago

In the short term, https://gitlab.exherbo.org/exherbo/x11/-/merge_requests/858 looks like a good solution for Exherbo. If there are other distributions where binutils still defaults to --disable-new-dtags, then an equivalent issue would exist in those distributions if they upgrade their Mesa to 24.2.0-rc1, and a change equivalent to !858 would be an equally good short-term solution for those other distros.

In the medium term, if https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30328 gets merged before the 24.2.0 stable release (preferably before 24.2.0-rc2), I believe it would avoid this issue completely.

Small additional note since the mentioned merge request landed in mesa main now (as you probably know with your comment to suggest cherry-picking it to staging/24.2 as well) with https://gitlab.freedesktop.org/mesa/mesa/-/commit/9b7bb6cc9fa410fb783e7a99d9eadcc31668f298 I've now replaced our workaround by applying the mentioned commit https://gitlab.exherbo.org/exherbo/x11/-/merge_requests/859 to our 24.2.0-rc1 package instead.

smcv commented 2 months ago

I believe updating to Mesa 24.2.0-rc2 should resolve this. If so, I think we can close the issue - I don't think running versions of OS components that are prereleases and also not up to date is a major use-case for Steam.

tgurr commented 2 months ago

Second that, Mesa 24.2.0-rc2 includes the upstream fix, I consider the issue resolved as well and thanks to you it didn't hit any stable Mesa release and resulted in an upstream fix not only fixing the issue on Exherbo but probably having additional benefits for everyone.

smcv commented 1 month ago

I consider the issue resolved as well

Please could you close the issue, then?

(As the issue submitter, you are allowed to close it; and as a moderator, @kisak-valve is allowed to close it; but I can't.)