ProcursusTeam / Procursus

Modern *OS Bootstrap
https://apt.procurs.us
BSD Zero Clause License
873 stars 128 forks source link

m4: Segmentation fault in posix_spawn #1171

Closed asdfugil closed 2 years ago

asdfugil commented 2 years ago

I ran into a segmentation fault when trying to do make shell-cmds on iPad Pro 9.7 WiFi, iOS 14.3. It happens with m4.

m4 version: 1.4.19 libiosexec version: 1.0.20~1.1-alpha1

Debugger output:

Process 59809 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x90)
    frame #0: 0x00000001b40eb37c libsystem_kernel.dylib`posix_spawn + 156
libsystem_kernel.dylib`posix_spawn:
->  0x1b40eb37c <+156>: ldr    x9, [x8, #0x88]
    0x1b40eb380 <+160>: cbz    x9, 0x1b40eb3c4           ; <+228>
    0x1b40eb384 <+164>: ldrsw  x11, [x9, #0x4]
    0x1b40eb388 <+168>: mov    w10, #0x18
Target 0: (m4) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x90)
  * frame #0: 0x00000001b40eb37c libsystem_kernel.dylib`posix_spawn + 156
    frame #1: 0x00000001045d9994 libiosexec.1.dylib`ie_posix_spawn + 48
    frame #2: 0x00000001045d9b88 libiosexec.1.dylib`ie_posix_spawnp + 372
    frame #3: 0x0000000104594030 m4`___lldb_unnamed_symbol207$$m4 + 1012
    frame #4: 0x000000010457fd44 m4`___lldb_unnamed_symbol23$$m4 + 292
    frame #5: 0x00000001045881e4 m4`___lldb_unnamed_symbol85$$m4 + 1632
    frame #6: 0x0000000104587f94 m4`___lldb_unnamed_symbol85$$m4 + 1040
    frame #7: 0x0000000104587b40 m4`expand_input + 140
    frame #8: 0x000000010457e1d0 m4`___lldb_unnamed_symbol3$$m4 + 128
    frame #9: 0x000000010457dd38 m4`main + 1592
    frame #10: 0x0000000188633568 libdyld.dylib`start + 4
(lldb) c

...which might have something to do with libiosexec

Torrekie commented 2 years ago

I didn't see m4 was using posix_spawn on my Apple M1 build

TorrekiedeMBP:m4 torrekie$ nm --undefined-only /Library/Developer/CommandLineTools/usr/bin/m4 | grep "posix_spawn"
TorrekiedeMBP:m4 torrekie$ nm --undefined-only /opt/homebrew/Cellar/m4/1.4.19/bin/m4 | grep "posix_spawn"
TorrekiedeMBP:m4 torrekie$ nm --undefined-only usr/bin/m4 | grep "posix_spawn"
_ie_posix_spawn
_ie_posix_spawnp

I guess, cross compiling m4 may cause some value not correctly detected, could probably fix that by recompile it.

Torrekie commented 2 years ago

I think I figured out why that happens, m4 considered Darwin's posix_spawn was insecure so they actually redefined one for internal use. But libiosexec's #define posix_spawn ie_posix_spawn is breaking this behavior. A possible fix for that is remove the libiosexec.h include from spawn.h and unistd.h, and add CPPFLAGS=-Dexecve=ie_execve since it still using execve from outside.

lib/spawn.c

/* Copyright (C) 2000, 2009-2021 Free Software Foundation, Inc.
   This file is part of the GNU C Library.

   This program is free software: you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation; either version 3 of the License, or
   (at your option) any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */

#include <config.h>

/* Specification.  */
#include <spawn.h>

#include "spawn_int.h"

/* Spawn a new process executing PATH with the attributes describes in *ATTRP.
   Before running the process perform the actions described in FILE-ACTIONS. */
int
posix_spawn (pid_t *pid, const char *path,
             const posix_spawn_file_actions_t *file_actions,
             const posix_spawnattr_t *attrp, char *const argv[],
             char *const envp[])
{
  return __spawni (pid, path, file_actions, attrp,
                   (const char * const *) argv, (const char * const *) envp, 0);
}

While compiled without libiosexec header, m4 no longer require _ie_posix_spawn symbol, m4 can then correctly use the self defined rpl_posix_spawn

TorrekiedeMBP:m4-libiosexec torrekie$ nm usr/bin/m4 | grep posix_spawn
000000010002e4b0 T _gl_posix_spawn_file_actions_realloc
                 U _ie_posix_spawn
                 U _ie_posix_spawnp
000000010002e104 T _posix_spawn_file_actions_addchdir
000000010002e1b4 T _rpl_posix_spawn_file_actions_addclose
000000010002e254 T _rpl_posix_spawn_file_actions_adddup2
000000010002e308 T _rpl_posix_spawn_file_actions_addopen
000000010002e3f4 T _rpl_posix_spawn_file_actions_destroy
000000010002e4fc T _rpl_posix_spawn_file_actions_init
000000010002e514 T _rpl_posix_spawnattr_destroy
000000010002e51c T _rpl_posix_spawnattr_init
000000010002e538 T _rpl_posix_spawnattr_setflags
000000010002e55c T _rpl_posix_spawnattr_setsigmask
TorrekiedeMBP:m4-libiosexec torrekie$ cd ../m4-no-iosexec
TorrekiedeMBP:m4-no-iosexec torrekie$ nm usr/bin/m4 | grep posix_spawn
0000000100034678 T _gl_posix_spawn_file_actions_realloc
0000000100034734 T _gl_posix_spawn_internal
0000000100034388 T _posix_spawn_file_actions_addchdir
0000000100034380 T _rpl_posix_spawn
0000000100034408 T _rpl_posix_spawn_file_actions_addclose
0000000100034488 T _rpl_posix_spawn_file_actions_adddup2
0000000100034524 T _rpl_posix_spawn_file_actions_addopen
00000001000345e8 T _rpl_posix_spawn_file_actions_destroy
00000001000346c4 T _rpl_posix_spawn_file_actions_init
00000001000346dc T _rpl_posix_spawnattr_destroy
00000001000346e4 T _rpl_posix_spawnattr_init
0000000100034700 T _rpl_posix_spawnattr_setflags
0000000100034724 T _rpl_posix_spawnattr_setsigmask
0000000100034b38 T _rpl_posix_spawnp

This possibly still happening on many other packages, I think we should remove that libiosexec header, and do symbol redefinitions only at link time or rename symbols after the linkage.

badger200 commented 2 years ago

@Torrekie I encountered this too and to my surprise, libiosexec 1.0.20 removes ie_posix_spawn and ie_posix_spawnp exports if SPAWN_H is defined, which it almost certainly will be!

The “1.0.16-1.0.19” version doesn’t have that problem, but as I recall it had a different problem, so I ended up using 1.0.20 source (I think) and then doing the small amount of copy and paste work from the other version (most if not all in libiosexec.h) to produce an ultimate version that works in all cases. It was pretty obvious work I recall, I’m not that great of a programmer haha. I’ve been meaning to file an Issue for libiosexec on this.

Here’s my current libiosexec.h apologies for the profanity 😅

One other libiosexec.c bug is they use an arbitrary arg max 1024 char but sys/syslimits.h defines ARG_MAX = 262122 for this very reason, so long compile commands like when compiling clang from source, will randomly crash and you’ll spend hours wondering why.

CRKatri commented 2 years ago

I encountered this too and to my surprise, libiosexec 1.0.20 removes ie_posix_spawn and ie_posix_spawnp exports if SPAWN_H is defined, which it almost certainly will be!

That is not what that change does. If SPAWN_H is defined then that means that spawn.h was included which means that when we replace posix_spawn with ie_posix_spawn in the source, it will get implicit function declaration errors. However if spawn.h is included after libiosexec.h then the #define will also replace the prototypes in spawn.h as well as the source code. In either case, ie_posix_spawn is available as both a symbol exported from the dylib and as a prototype to be used in your code (as long as spawn.h is included just like normal posix_spawn).

badger200 commented 2 years ago

Hmm @CRKatri thanks for the explanation. It was several weeks ago and all I know is when I upgraded from 1.0.16-19, to 1.0.20, suddenly my existing binaries failed because of missing ie_posix_spawn. Check your own libiosexec binaries I think they literally do not export ie_posix_spawn, that’s why I had to compile my own!

CRKatri commented 2 years ago

I can assure you that they do, @asdfugil uses make on his iPad frequently which uses ie_posix_spawn. There was a brief stint were the debs built from Procursus didn't but the debs on the apt repo did, which may have been your issue.

badger200 commented 2 years ago

@CRKatri Just checked and libiosexec.1.dylib which I had renamed libiosexec.1.0.19 so I'm assuming that's what it was, dated 2021/03/29, signed by Hayden Seay's Developer ID Certificate, UUID: 1F7E0528-89D9-3814-BC7F-F5B03B0EA781 indeed does not export any posix_spawn whatsoever.

Hayden's signing ID says I couldn't possibly have compiled that binary myself.

Maybe I'm confused and posix_spawn wasn't added until .20 and I compiled that?

CRKatri commented 2 years ago

Can you reinstall the latest libiosexec from apt.procurs.us?

CRKatri commented 2 years ago

ie_posix_spawn wasnt added to libiosexec till July 21 2021 (143ba8b)

badger200 commented 2 years ago

Ah yes now I think I recall what happened: my clang builds crashed with > 1024 char arguments being exec'd through libiosexec, then as mentioned above:

"One other libiosexec.c bug is they use an arbitrary arg max 1024 char but sys/syslimits.h defines ARG_MAX = 262122 for this very reason, so long compile commands like when compiling clang from source, will randomly crash and you’ll spend hours wondering why."

So I must've downloaded the source for 1.0.20 and compiled it which produced a binary without any posix_spawn exports, leading me to the course of events mentioned earlier.

CRKatri commented 2 years ago

You seem to have confused ARG_MAX with PATH_MAX (as taken from https://github.com/apple-oss-distributions/xnu/blob/e7776783b89a353188416a9a346c6cdb4928faad/bsd/sys/syslimits.h#L107). libiosexec uses PATH_MAX not ARG_MAX.

badger200 commented 2 years ago

I'm afraid not. The issue was never a long path. It was clang link commands with hundreds of object files totaling about 16KiB.

I find git the least intuitive system I've ever used, otherwise I'd gladly submit a proper patch, but in libiosexec get_new_argv.c, sys/syslimits.h needs to be included,

Then around line 64:

// JNS from sys/syslimits.h - ARG_MAX = 1024 * 256 = 262144 "max bytes for an exec function" // ORIG: char argv_new = calloc(1024, 1); char argv_new = calloc(ARG_MAX, 1);

You guys were doing a seemingly arbitrary calloc of 1024 bytes but that's not nearly enough, especially now that I see ARG_MAX (which, scroll up just a few lines on the bsd source you just linked, defines it as "Max bytes for an exec function") is defined to either 256KiB or 1024KiB depending on defines.

Just make a large clang command line long enough to exceed 1024 bytes and you'll see.

Or start compiling LLVM, I think lib/Support (lib/LibLLVMSupport.a) will be the first one to trigger it.

CRKatri commented 2 years ago

Could you join the Procursus discord server https://diatr.us/discord? It'll be a bit easier to talk there.