Open mazirah opened 8 months ago
Please try running Steam with STEAM_LINUX_RUNTIME_VERBOSE=1
in the environment (for example STEAM_LINUX_RUNTIME_VERBOSE=1 steam
), and collect the resulting steamwebhelper.log
(it will be much larger).
I've ran the verbose command, here are the new logs: steam-logs.tar.gz I've updated the issue as well.
This is happening because Solus installs multiple builds of the same libraries, some compiled for a CPU newer than yours and some not. This is unusual: most distributions target a single baseline architecture, don't support anything older at all, and don't build any libraries that require anything newer either. Clear Linux is the other distribution most likely to be affected.
Normally, the libraries that require a newer CPU are automatically skipped, but the Steam Linux Runtime container framework does not know how to do that, and is using the first implementation that it finds for each library. Unfortunately, in your case, the first implementation it finds is the one for x86_64 v3 (approximately Intel Haswell or later, circa 2013), and your CPU is older than that, so the x86_64 v3 libraries won't work.
A temporary workaround might be to move /usr/lib64/glibc-hwcaps/x86-64-v3
out of the way on affected systems.
According to https://discuss.getsol.us/d/10152-solus-5-and-x86-64-v3-target, at some point in the future ("When we rebase off of Serpent"), Solus is going to switch to a different approach which will probably avoid this issue as a side-effect.
Normally, the libraries that require a newer CPU are automatically skipped, but the Steam Linux Runtime container framework does not know how to do that, and is using the first implementation that it finds for each library.
@smcv ... but we (Solus) are just using bog standard glibc hardware caps? Wouldn't it be best if the Steam Linux Runtime container framework was taught about those sooner rather than later...?
If I'm missing something obvious here, please feel free to enlighten me -- always happy to learn more.
Wouldn't it be best if the Steam Linux Runtime container framework was taught about those sooner rather than later...?
Of course it would, but writing code takes longer than triaging issue reports.
Wouldn't it be best if the Steam Linux Runtime container framework was taught about those sooner rather than later...?
@smcv : Of course it would, but writing code takes longer than triaging issue reports.
Would you be open to a PR...?
cc. @ikeycode
Would you be open to a PR...?
Sure. We can't technically accept merge requests at the moment, because the relevant code is on a Gitlab instance that is not open to external users, but the next best thing is to have a branch of the same repository hosted in some public location (Github, gitlab.com, anywhere else suitable) and give us a reference that we can git fetch
for review.
I think what's needed is that search_ldcache_cb()
in https://gitlab.collabora.com/vivek/libcapsule/-/blob/master/utils/ld-libs.c?ref_type=heads needs to be taught to match libraries with non-trivial hwcaps against the CPU's actual capabilities, and skip libraries where the hwcaps are too high. https://gitlab.collabora.com/vivek/libcapsule/-/blob/master/utils/ld-cache.c?ref_type=heads might also be relevant.
(The production version of this code as used in the Steam Linux Runtime is vendored into https://gitlab.steamos.cloud/steamrt/steam-runtime-tools, but libcapsule is its canonical upstream location.)
Tracked as steamrt/tasks#410 internally
A temporary workaround might be to move
/usr/lib64/glibc-hwcaps/x86-64-v3
out of the way on affected systems.
~I can confirm that sudo mv /usr/lib64/glibc-hwcaps/x86-64-v3 /usr/lib64/glibc-hwcaps/x86-64-v3.disabled
worked on an affected system.~
A few days and a reboot later, this quick fix failed to solve the problem, and I was back to square one. However, after a little bit of head scratching, I figured that, one way or another, some binaries (probably the steam ones), also requiring AVX2 AFAICT, still took precedence over the system ones. After checking the link https://discuss.getsol.us/d/10152-solus-5-and-x86-64-v3-target from the messages above, I noticed that it was possible to run ld.so --help
to see the list of supported glibc-hwcaps.
So here is the solution I came up with:
sudo mkdir -p /usr/lib64/glibc-hwcaps/x86-64-v2/engines-3/
sudo mkdir /usr/lib64/glibc-hwcaps/x86-64-v2/ossl-modules/
find /usr/lib64/glibc-hwcaps/x86-64-v3 -type l -exec sudo cp -P {} /usr/lib64/glibc-hwcaps/x86-64-v2/ \;
/usr/lib64
The result should be:
/usr/lib64/glibc-hwcaps/x86-64-v2/
├── engines-3
│ ├── afalg.so -> /usr/lib64/engines-3/afalg.so
│ ├── capi.so -> /usr/lib64/engines-3/capi.so
│ ├── loader_attic.so -> /usr/lib64/engines-3/loader_attic.so
│ └── padlock.so -> /usr/lib64/engines-3/padlock.so
├── libaom.so.3 -> libaom.so.3.8.2
├── libaom.so.3.8.2 -> /usr/lib64/libaom.so.3.8.2
├── libcrypto.so.3 -> /usr/lib64/libcrypto.so.3
├── libcrypt.so.1 -> libcrypt.so.1.1.0
├── libcrypt.so.1.1.0 -> /usr/lib64/libcrypt.so.1.1.0
├── libcrypt.so.2 -> libcrypt.so.2.0.0
├── libcrypt.so.2.0.0 -> /usr/lib64/libcrypt.so.2.0.0
├── libc.so.6 -> /usr/lib64/libc.so.6
├── libdav1d.so.7 -> libdav1d.so.7.0.0
├── libdav1d.so.7.0.0 -> /usr/lib64/libdav1d.so.7.0.0
├── libfftw3f_omp.so.3 -> libfftw3f_omp.so.3.6.10
├── libfftw3f_omp.so.3.6.10 -> /usr/lib64/libfftw3f_omp.so.3.6.10
├── libfftw3f.so.3 -> /usr/lib64/libfftw3f.so.3
├── libfftw3f.so.3.6.10 -> /usr/lib64/libfftw3f.so.3.6.10
├── libfftw3f_threads.so.3 -> libfftw3f_threads.so.3.6.10
├── libfftw3f_threads.so.3.6.10 -> /usr/lib64/libfftw3f_threads.so.3.6.10
├── libfftw3_omp.so.3 -> libfftw3_omp.so.3.6.10
├── libfftw3_omp.so.3.6.10 -> /usr/lib64/libfftw3_omp.so.3.6.10
├── libfftw3.so.3 -> libfftw3.so.3.6.10
├── libfftw3.so.3.6.10 -> /usr/lib64/libfftw3.so.3.6.10
├── libfftw3_threads.so.3 -> libfftw3_threads.so.3.6.10
├── libfftw3_threads.so.3.6.10 -> /usr/lib64/libfftw3_threads.so.3.6.10
├── libFLAC.so.12 -> libFLAC.so.12.1.0
├── libFLAC.so.12.1.0 -> /usr/lib64/libFLAC.so.12.1.0
├── libgraphene-1.0.so.0 -> libgraphene-1.0.so.0.1000.8
├── libgraphene-1.0.so.0.1000.8 -> /usr/lib64/libgraphene-1.0.so.0.1000.8
├── libm.so.6 -> /usr/lib64/libm.so.6
├── libmvec.so.1 -> /usr/lib64/libmvec.so.1
├── libpng16.so.16 -> libpng16.so.16.43.0
├── libpng16.so.16.43.0 -> /usr/lib64/libpng16.so.16.43.0
├── libraw_r.so.23 -> libraw_r.so.23.0.0
├── libraw_r.so.23.0.0 -> /usr/lib64/libraw_r.so.23.0.0
├── libraw.so.23 -> libraw.so.23.0.0
├── libraw.so.23.0.0 -> /usr/lib64/libraw.so.23.0.0
├── libssl.so.3 -> /usr/lib64/libssl.so.3
├── libvpx.so.8 -> libvpx.so.8.0.1
├── libvpx.so.8.0 -> libvpx.so.8.0.1
├── libvpx.so.8.0.1 -> /usr/lib64/libvpx.so.8.0.1
├── libwebp.so.7 -> libwebp.so.7.1.8
├── libwebp.so.7.1.8 -> /usr/lib64/libwebp.so.7.1.8
├── libz.so.1 -> libz.so.1.3.1
├── libz.so.1.3.1 -> /usr/lib64/libz.so.1.3.1
└── ossl-modules
└── legacy.so -> /usr/lib64/ossl-modules/legacy.so
After that, steam starts.
For anyone having this issue on Solus, please give the below a try:
/etc/environment
if it does not already exist and add the following to it:
GLIBC_TUNABLES=glibc.cpu.hwcaps=-AVX
Functionally this environmental variable configures the glibc dynamic loader to ignore all libraries that are built with AVX support (basically everything in the glibc-hwcaps directory) which should fix this issue if Steam is parsing the ld.so output to determine what libraries to pull into the container.
Note that it may also slow down other applications on the system if they do CPU-level feature detection in their code itself (mostly media and crypto libs) though this is probably not going to be to any noticeable degree.
It would be useful information if someone tries that, but I'll warn you now that the workaround in the previous comment probably isn't going to work, because the container runtime infrastructure involves parsing the binary ld.so.cache
directly.
The older LD_LIBRARY_PATH
runtime did work by screen-scraping ldconfig
output, and that would maybe have taken into account GLIBC_TUNABLES
(?), but we were never very happy about that, because getting machine-readable information out of human-readable diagnostic output is really fragile.
I am not aware of anything in Steam that parses ld.so
output, but perhaps you meant ldconfig
anyway?
I still think the only reliable answer to this is going to be https://github.com/ValveSoftware/steam-for-linux/issues/10556#issuecomment-1973169201. One of my colleagues has it on his list, but it's a long list.
Well,personaly i find i rather frustrating that since months nothing changed for the common user. I mean the feb`24 update of the Steam client software caused this. Before all was just fine. But now those who are not privi to fiddle with systemfiles and such are just left in the desert.
I am not aware of anything in Steam that parses ld.so output, but perhaps you meant ldconfig anyway?
I did mean that, I was not aware that it parsed ld.so.cache directly I thought it parsed CLI output. I had hoped that it would still work assuming ldconfig respected the tunable when generating the cache but after some testing it doesn't look like it does.
Anyway, as a workaround I added a patch to our glibc package (see here) which will cause ldconfig to skip checking hwcaps directories if the STEAM_HACK_IGNORE_HWCAPS
environmental variable is defined. I verified that it worked and that the ld cache no longer contained any reference to said libs, only referencing the base ones.
After the next sync (this Friday or so) you should be able to do the following to get Steam working again:
echo "STEAM_HACK_IGNORE_HWCAPS=1" | sudo tee /etc/environment
sudo ldconfig -X
FYI this update is live for stable users and we've seen confirmation from users that it works
I had hoped that it would still work assuming ldconfig respected the tunable when generating the cache but after some testing it doesn't look like it does.
I believe ldconfig ignores the current CPU and GLIBC_TUNABLES
when populating the cache, and instead enters each copy of each library into the cache, along with its required hwcaps. It's ld.so
that is responsible for matching the current CPU (and maybe GLIBC_TUNABLES
) against the hwcaps, and disregarding libraries that are listed in the cache as requiring a newer CPU than the one we're actually running on. If it didn't work that way, you wouldn't be able to install a system on a newer CPU, and then boot it on an older CPU (for example for disaster-recovery purposes).
The bug is that the code in libcapsule that parses the cache does not take the "required hwcaps" field into account, and instead assumes that all libraries are OK. In most distros this is true (because most distros don't have hwcaps-gated libraries that require extremely new CPUs), but on Solus it is not.
I still think https://github.com/ValveSoftware/steam-for-linux/issues/10556#issuecomment-1973169201 is the correct long-term solution, it is still on my colleague's to-do list, and it is still a sufficiently long list that I cannot predict when or whether it will happen.
After the next sync (this Friday or so) you should be able to do the following to get Steam working again:
echo "STEAM_HACK_IGNORE_HWCAPS=1" | sudo tee /etc/environment
- Log out or reboot so that that environmental variable is activated
- Run
sudo ldconfig -X
Thanks,works like nicely.
Your system information
System Details Report
Report details
Hardware Information:
Software Information:
Firmware Version: Ua9
OS Name: Solus 4.5 Resilience
OS Build: (null)
OS Type: 64-bit
GNOME Version: 45.4
Windowing System: X11
Kernel Version: Linux 6.6.18-278.current
Steam client version (build number or date): newest
Distribution (e.g. Ubuntu): Solus
Opted into Steam client beta?: No
Have you checked for system updates?: Yes
Steam Logs: steam-logs.tar.gz
Steam verbose Logs: steam-logs.tar.gz
GPU: Nvidia
Error on startup:
Here are the contents of steamwebhelper.log