ValveSoftware / steam-for-linux

Issue tracking for the Steam for Linux beta client
4.25k stars 175 forks source link

libsteam_api.so writes out-of-bounds and corrupts memory #8542

Open CyberShadow opened 2 years ago

CyberShadow commented 2 years ago

Your system information

Please describe your issue in as much detail as possible:

I'm seeing a random crash upon attempting to start The Talos Principle.

Looking at the stack trace and memcheck output, it seems to be caused by an out-of-bounds write by the libsteam_api.so library (from the game's directory).

==3210207== Invalid write of size 2
==3210207==    at 0xA40FAD7: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/libsteam_api.so)
==3210207==    by 0xA4101DC: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/libsteam_api.so)
==3210207==    by 0xA41074E: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/libsteam_api.so)
==3210207==    by 0xA4122E0: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/libsteam_api.so)
==3210207==    by 0xA41254A: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/libsteam_api.so)
==3210207==    by 0xA413414: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/libsteam_api.so)
==3210207==    by 0x25C64EC: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/Talos_)
==3210207==    by 0x1CA8426: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/Talos_)
==3210207==    by 0xEB5CE1: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/Talos_)
==3210207==    by 0xEB80D9: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/Talos_)
==3210207==    by 0x1CA2749: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/Talos_)
==3210207==    by 0x22DA5ED: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/Talos_)
==3210207==  Address 0xdba3da7 is 39 bytes inside a block of size 40 alloc'd
==3210207==    at 0x5550899: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==3210207==    by 0x58065AD: __libc_scratch_buffer_dupfree (in /usr/lib/libc.so.6)
==3210207==    by 0x57B63C5: realpath_stk (in /usr/lib/libc.so.6)
==3210207==    by 0x57B69D5: realpath@@GLIBC_2.3 (in /usr/lib/libc.so.6)
==3210207==    by 0xA40FABD: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/libsteam_api.so)
==3210207==    by 0xA4101DC: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/libsteam_api.so)
==3210207==    by 0xA41074E: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/libsteam_api.so)
==3210207==    by 0xA4122E0: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/libsteam_api.so)
==3210207==    by 0xA41254A: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/libsteam_api.so)
==3210207==    by 0xA413414: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/libsteam_api.so)
==3210207==    by 0x25C64EC: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/Talos_)
==3210207==    by 0x1CA8426: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/Talos_)
==3210207== 
==3210207== Invalid read of size 1
==3210207==    at 0xA4101F3: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/libsteam_api.so)
==3210207==    by 0xA41074E: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/libsteam_api.so)
==3210207==    by 0xA4122E0: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/libsteam_api.so)
==3210207==    by 0xA41254A: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/libsteam_api.so)
==3210207==    by 0xA413414: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/libsteam_api.so)
==3210207==    by 0x25C64EC: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/Talos_)
==3210207==    by 0x1CA8426: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/Talos_)
==3210207==    by 0xEB5CE1: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/Talos_)
==3210207==    by 0xEB80D9: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/Talos_)
==3210207==    by 0x1CA2749: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/Talos_)
==3210207==    by 0x22DA5ED: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/Talos_)
==3210207==    by 0x22DA92F: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/Talos_)
==3210207==  Address 0xdba3da8 is 0 bytes after a block of size 40 alloc'd
==3210207==    at 0x5550899: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==3210207==    by 0x58065AD: __libc_scratch_buffer_dupfree (in /usr/lib/libc.so.6)
==3210207==    by 0x57B63C5: realpath_stk (in /usr/lib/libc.so.6)
==3210207==    by 0x57B69D5: realpath@@GLIBC_2.3 (in /usr/lib/libc.so.6)
==3210207==    by 0xA40FABD: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/libsteam_api.so)
==3210207==    by 0xA4101DC: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/libsteam_api.so)
==3210207==    by 0xA41074E: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/libsteam_api.so)
==3210207==    by 0xA4122E0: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/libsteam_api.so)
==3210207==    by 0xA41254A: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/libsteam_api.so)
==3210207==    by 0xA413414: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/libsteam_api.so)
==3210207==    by 0x25C64EC: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/Talos_)
==3210207==    by 0x1CA8426: ??? (in /mnt/local/steam/SteamLibrary/steamapps/common/The Talos Principle/Bin/x64/Talos_)

Looking at the decompiled pseudocode:

char * FUN_0010fa6d(void)

{
     char *szHome;
     char *szRealPath;
     long i;
     long in_FS_OFFSET;
     byte bZero;
     char szPath [4095];
     undefined local_19;
     long _stack_cookie;
     char *szA;
     char c;

     bZero = 0;
     _stack_cookie = *(long *)(in_FS_OFFSET + 0x28);
     szHome = getenv("HOME");
     snprintf(szPath,0x1000,"%s/.steam/steam",szHome);
     local_19 = 0;
     szRealPath = realpath(szPath,(char *)0x0);
     if (szRealPath == (char *)0x0) {
          szRealPath = strdup("/");
     }
     else {
          i = -1;
          szA = szRealPath;
          do {
                if (i == 0) break;
                i -= 1;
                c = *szA;
                szA = szA + (ulong)bZero * -2 + 1;
          } while (c != '\0');
          *(undefined2 *)(szRealPath + (~i - 1)) = L'/';  // <-- BUG HERE
     }
     if (_stack_cookie == *(long *)(in_FS_OFFSET + 0x28)) {
          return szRealPath;
     }
                                                     /* WARNING: Subroutine does not return */
     __stack_chk_fail();
}

It looks like the function is trying to append / and a null terminator to the result of realpath, but there is no room for that there. This write likely corrupts the allocator's metadata which causes the crash.

Steps for reproducing this issue:

  1. Install The Talos Principle using Steam
  2. Try to start game
  3. Observe that the game doesn't start, and in the terminal can be seen:
malloc(): invalid size (unsorted)
INF:  Crash! (Aborted)

Seems to depend on memory alignment and is not reproducible reliably. It certainly depends on the exact length of the directory in which Steam is installed, which is why it will not be reproducible for most Steam users.

I tried running the game with a different version of libsteam_api.so, but the same problem manifests across all versions which are ABI-compatible that I found.

Hope this helps.

zkarj commented 1 year ago

I'm having the same issue with multiple games: Hitman, DiRT Rally, I Hate Running Backwards, Middle-earth: Shadow of Mordor, Serious Sam Fusion 2017, Sid Meier's Civilization IV, XCOM 2, with the same error: malloc(): invalid size (unsorted). This is about a third of my Steam games that run natively in Linux, so very disappointing.

For some reason, I never had these issues in any Linux-native Steam games until steam client update to v020 2023-02-14 1676336721 or some time around early February 2023.

Is there a way to patch libsteam_api.so or re-build it somehow if game developers or Valve won't solve it soon enough?

HugKitten commented 1 year ago

Adding to this: I believe I am experiencing the same issue in modded starbound

TTimo commented 1 year ago

The code analyzed above used to do a strncat( pszContentPath, "/", PATH_MAX ); but that was fixed back in 2017 and now uses a realloc.

I don't see an out of bound write opportunity in the new code, but also I don't think the older code could have been writing out of bounds unless the HOME environment variable would be pathologically long. So something doesn't line up.

Problem is that bad code from the SDK shipped and remains in all titles. What happens if you substitute libsteam_api.so with the latest from the steamworks SDK?

If you have crash IDs you can provide those could be useful as well.

HugKitten commented 1 year ago

I actually tried that already and it didn't fix my issue. I'll try again later today. My original issue was posted here

CyberShadow commented 1 year ago

@HugKitten Your issue is almost certainly unrelated because my issue occurs on start-up, before any game code runs (and therefore before the game can create any windows).

Looking at the stack trace in https://pastebin.com/aEcmtSR8, it seems completely different and completely unrelated to this issue.

Edit: And also you are running a Windows executable, so it cannot possibly be related to this issue, which is for the Linux API shared library.

CyberShadow commented 1 year ago

The code analyzed above used to do a strncat( pszContentPath, "/", PATH_MAX );

Interesting, is the source code you're quoting available? (Your profile does not show that you are a Valve employee 😄)

but also I don't think the older code could have been writing out of bounds unless the HOME environment variable would be pathologically long

It was appending one character to the string returned by realpath, so writing one byte out-of-bounds of the string allocated by it, which (depending on memory alignment) might belong to another heap variable. My $HOME is maybe longer than usual (it would be /home/vladimir/steam) but it's not close to MAX_PATH (nor should that be related, according to my understanding of the machine code above).

What happens if you substitute libsteam_api.so with the latest from the steamworks SDK?

I can no longer reproduce this issue with The Talos Principle, probably the game has been updated since I posted it, including an updated libsteam_api.so. But also, this issue is indeed not specific to any particular title.

HugKitten commented 1 year ago

Yeah, my bad. Sorry, feel free to delete my messages. ^^

TTimo commented 1 year ago

but also I don't think the older code could have been writing out of bounds unless the HOME environment variable would be pathologically long

It was appending one character to the string returned by realpath, so writing one byte out-of-bounds of the string allocated by it, which (depending on memory alignment) might belong to another heap variable.

Ah yes, that's why it would have been crashing regardless of string size. Anyway, confirming that this was fixed way back in 2017, unfortunately it's likely we have a bunch of titles out in the wild with the bad code still that won't ever get updated (that source code is not available to the public - I am a Valve contractor).

flibitijibibo commented 10 months ago

If anyone has an existing Sourceware Bugzilla account and/or has experience submitting patches there, I wrote a glibc patch that should resolve this issue: https://github.com/ValveSoftware/steam-for-linux/issues/10317#issuecomment-1871641032

DanielGibson commented 2 months ago

@TTimo Can't Valve update affected versions of libsteam_api.so in broken games with a fixed version? Or at least notify the developers that they should update the lib? This bug is kinda nasty, as I think how likely it is to trigger depends on at least two factors:

So it's quite possible that users report a crash and the developer by dumb chance can't reproduce it due to a different username or other difference on their system.

Regarding patching glibc, I don't think that it's reasonable to assume that realpath() will return a buffer of size PATH_MAX, and if software starts relying on that it just becomes more broken and less portable. Though maybe guaranteeing that at least one char can be appended (to add a path separator) might be a compromise...

DanielGibson commented 2 months ago

Can't Valve update affected versions of libsteam_api.so in broken games with a fixed version? Or at least notify the developers that they should update the lib?

And also, detect (and reject) broken libs when games are uploaded to Steam? This bug also appeared in Kitsune Tails which was released just two weeks ago or so, and used that old lib because

for some reason achievements weren't working and I was out of time and I knew those [libs] were known good. (source, see also)

(No idea why achievements didn't work when they tested, I verified that they do work with the libsteam_api.so shipped with Proton 8, but it's not unusual for gamedevs to ship older "known good", or as it turns out "assumed good", libs)

TTimo commented 2 months ago

While it's unfortunate that devs would pick an SDK that old for a new title, we do not plan to take a more proactive stance on this. As a general policy we do not modify a partner's game content. Also there's no guarantee that the latest libsteam_api.so would be ABI compatible if we were to just drop it in (even though it does seem to work in practice).

DanielGibson commented 2 months ago

As this is a library from Valve that is buggy (and reproducing the bug isn't always easy, but if you're affected it crashes most of the time), Valve should be more pro-active about it.

TBH the idea that rather glibc should be patched to work around a Valve bug than Valve being proactive about replacing a broken lib that they released, and that is only distributed through their digital store, is.. surprising, at best.

This issue seems to cause lots of pain with developers (and users, of course!), whose takeaway might be that shipping on Linux isn't worth the hassle, because if the bug triggers depends on the users system and "each Linux is built differently" and it's so hard to support even just the common distros (which isn't really true, but weird bugs like this one seem to confirm that) - and even Valve employees tend to blame the crash on users system rather than their own code, even years after the bug has been fixed, see for example https://github.com/ValveSoftware/steam-for-linux/issues/10317#issuecomment-1870540485

It should be feasible to detect versions of libsteam_api.so that were affected by the bug (e.g. by their checksum) and replace them with a version that is identical except for the bugfix, to avoid breaking the ABI? And if not, you could LD_PRELOAD a lib that replaces realpath() with a version that calls realloc() afterwards to make space for the additional slash.

flibitijibibo commented 2 months ago

One other detail: After a decade of shipping stuff I don't even know which of my own titles have an affected version - if we absolutely cannot change the .so broadly, detecting and warning on the presence of the affected files would help a lot. Maybe that's a job for SteamDB, I dunno.

I already submitted the proposal to glibc and they said no, so that's also out unless someone with more political clout is proactive about it. One way or another someone with more power than me or Daniel will have to get involved, whether it's glibc, libsteam_api, or possibly even SteamPipe.

CyberShadow commented 2 months ago

One thing Valve could do is ship a patched glibc which includes a work-around for this bug as part of its Linux Steam runtime. That would allow fixing the problem without having to modify partners' game content.

DanielGibson commented 2 months ago

One other detail: After a decade of shipping stuff I don't even know which of my own titles have an affected version

What makes this even harder: We don't know what versions of libsteam_api.so are affected. Was it all versions up to the one that fixed it (whichever that was)? Or only from a specific version on? And is there even a simple (or at least feasible) way to get the library version from libsteam_api.so? If not, could Valve release a list of sha256sums (or similar) of affected versions (for both x86 and amd64)? (I still think Valve should run a check and either update the libs themselves or at least notify developers. Come on guys, don't make native Linux support seem like more of a PITA than it would otherwise be - we all want this platform to succeed <3)

One thing Valve could do is ship a patched glibc

I think glibc is one of the things you should avoid shipping with programs and use the system version instead, because it uses system settings, that may be at different places on different distros, and might even have different formats for different glibc versions? Things like hosts/resolver settings (used by getaddrinfo() etc) and generated locales, for example. If you really want to avoid modifying the game content, the way of LD_PRELOAD-ing a lib that just replaces/wraps realpath() is probably better..

BTW, it is possible that newer glibc versions use a different internal heap layout and are less fragile when it comes to writing just a single byte too far. On XUbuntu 22.04 (glibc 2.35) the heap gets corrupted by this bug, causing glibc-internal consistency checks to fail on later malloc()/realloc()/free() calls (I got errors like "free(): invalid pointer" and "malloc(): invalid size (unsorted)" and "corrupted size vs. prev_size" and "realloc(): invalid pointer"). This is just a theory though (based on Kitsune Tails not crashing on my Arch Linux laptop with glibc 2.40). I don't know if the heap layout really has changed, if so in what version, and of course we can't rely on it not getting changed again to again make this bug more likely to trigger - and Ubuntu 22.04 (and Linux Mint versions based on that) are still broadly used.

CyberShadow commented 2 months ago

I think glibc is one of the things you should avoid shipping with programs and use the system version instead, because it uses system settings, that may be at different places on different distros, and might even have different formats for different glibc versions?

Pretty sure the Steam Linux runtime includes glibc. See e.g. https://gitlab.steamos.cloud/steamrt/steamrt/-/tree/steamrt/soldier/

If you really want to avoid modifying the game content, the way of LD_PRELOAD-ing a lib that just replaces/wraps realpath() is probably better..

Steam runs games on Linux in a container which includes a custom userspace. See e.g. https://gitlab.steamos.cloud/steamrt/steam-runtime-tools/-/tree/main/pressure-vessel, https://github.com/containers/bubblewrap

DanielGibson commented 2 months ago

Ok - looks like at least Kitsune Tails does not use such a runtime by default, but my system libs (checked with /proc/$gamepid/maps). When I enforce using "Steam Linux Runtime 3.0 (sniper)" (under Properties -> Compatibility), it seems to use some container libc:

/proc/310866$ grep libc maps
7c9165200000-7c9165228000 r--p 00000000 103:06 1310497                   /run/host/usr/lib/x86_64-linux-gnu/libc.so.6
7c9165228000-7c91653bd000 r-xp 00028000 103:06 1310497                   /run/host/usr/lib/x86_64-linux-gnu/libc.so.6
(...)

However, that also seems to be glibc 2.35 and with the Sniper runtime the game crashes most of the time just like with my system libc.

But you're right, patching the glibc versions shipped in the Steam runtimes would make this problem at least a bit less bad, especially as "try enforcing the scout or sniper runtime" seems to be a common suggestion among users when things don't work (besides "just use Proton" :upside_down_face:).