Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.91k stars 542 forks source link

Blead (9e254b0) Broke on Alpine Linux under g++ with Threads Enabled #21040

Closed cjg-cguevara closed 1 year ago

cjg-cguevara commented 1 year ago

Blead (9e254b0) broke on Alpine Linux under g++ with threads enabled.

Example: https://perl5.test-smoke.org/report/5033265 https://perl5.test-smoke.org/file/log_file/5033265

Appears to have started at 9e254b0.

Note that Alpine Linux uses musl instead of glibc:

$ ldd --version
musl libc (x86_64)
Version 1.2.3
Dynamic Program Loader
Usage: /lib/ld-musl-x86_64.so.1 [options] [--] pathname
jkeenan commented 1 year ago

@cjg-cguevara, can you paste the tail of the output from make on one or more of these threaded builds?

@khwilliamson, can you take a look?

cjg-cguevara commented 1 year ago
g++ -c -DPERL_CORE -D_REENTRANT -D_GNU_SOURCE -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -Wall -Werror=pointer-arith -Werror=vla -Wextra -Wno-long-long -Wwrite-strings -Wno-use-after-free caretx.c
g++ -c -DPERL_CORE -D_REENTRANT -D_GNU_SOURCE -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -Wall -Werror=pointer-arith -Werror=vla -Wextra -Wno-long-long -Wwrite-strings -Wno-use-after-free dquote.c
g++ -c -DPERL_CORE -D_REENTRANT -D_GNU_SOURCE -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -Wall -Werror=pointer-arith -Werror=vla -Wextra -Wno-long-long -Wwrite-strings -Wno-use-after-free time64.c
g++ -c -DPERL_CORE -D_REENTRANT -D_GNU_SOURCE -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -Wall -Werror=pointer-arith -Werror=vla -Wextra -Wno-long-long -Wwrite-strings -Wno-use-after-free miniperlmain.c
g++ -fstack-protector-strong -L/usr/local/lib -o miniperl \
    opmini.o perlmini.o universalmini.o  gv.o toke.o perly.o pad.o regcomp.o regcomp_debug.o regcomp_invlist.o regcomp_study.o regcomp_trie.o regexec.o dump.o util.o mg.o reentr.o mro_core.o keywords.o builtin.o class.o hv.o av.o run.o pp_hot.o sv.o pp.o scope.o pp_ctl.o pp_sys.o peep.o doop.o doio.o utf8.o taint.o deb.o globals.o perlio.o numeric.o mathoms.o locale.o pp_pack.o pp_sort.o caretx.o dquote.o time64.o  miniperlmain.o  -lpthread -ldl -lm -lcrypt -lutil -lc
./miniperl -w -Ilib -Idist/Exporter/lib -MExporter -e '<?>' || sh -c 'echo >&2 Failed to build miniperl.  Please run make minitest; exit 1'
Segmentation fault
Failed to build miniperl. Please run make minitest
make: *** [makefile:386: lib/buildcustomize.pl] Error 1
Leont commented 1 year ago

Can you backtrace that segfault with gdb?

cjg-cguevara commented 1 year ago
GNU gdb (GDB) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-alpine-linux-musl".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
[New LWP 16559]
Core was generated by `./miniperl -w -Ilib -Idist/Exporter/lib -MExporter -e <?>'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fba66d1585c in ?? ()
(gdb) symbol-file /home/core/src/perl5-blead/miniperl
Reading symbols from /home/core/src/perl5-blead/miniperl...
(gdb) sharedlibrary
Reading symbols from /lib/ld-musl-x86_64.so.1...
(No debugging symbols found in /lib/ld-musl-x86_64.so.1)
(gdb) bt
#0  0x00007fba66d1585c in tss_get () from /lib/ld-musl-x86_64.so.1
#1  0x0000560489f4d9f1 in ?? ()
#2  0x0000000000000000 in ?? ()
(gdb) q
GNU gdb (GDB) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-alpine-linux-musl".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./miniperl...
(gdb) run
Starting program: /home/core/src/perl5-blead/miniperl

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7fba85c in tss_get () from /lib/ld-musl-x86_64.so.1
(gdb) q
A debugging session is active.

        Inferior 1 [process 16597] will be killed.

Quit anyway? (y or n) y
==16567== Memcheck, a memory error detector
==16567== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==16567== Using Valgrind-3.20.0-5147d671e4-20221024 and LibVEX; rerun with -h for copyright info
==16567== Command: ./miniperl
==16567==
--16567-- Valgrind options:
--16567--    -v
--16567-- Contents of /proc/version:
--16567--   Linux version 5.15.108-0-lts (buildozer@build-3-17-x86_64) (gcc (Alpine 12.2.1_git20220924-r4) 12.2.1 20220924, GNU ld (GNU Binutils) 2.39) #1-Alpine SMP Fri, 21 Apr 2023 05:55:14 +0000
--16567--
--16567-- Arch and hwcaps: AMD64, LittleEndian, amd64-cx16-rdtscp-sse3-ssse3-avx-rdrand
--16567-- Page sizes: currently 4096, max supported 4096
--16567-- Valgrind library directory: /usr/libexec/valgrind
--16567-- Reading syms from /home/core/src/perl5-blead/miniperl
--16567-- Reading syms from /lib/ld-musl-x86_64.so.1
--16567--    object doesn't have a symbol table
--16567-- Reading syms from /usr/libexec/valgrind/memcheck-amd64-linux
--16567--    object doesn't have a dynamic symbol table
--16567-- Scheduler: using generic scheduler lock implementation.
--16567-- Reading suppressions file: /usr/libexec/valgrind/default.supp
==16567== embedded gdbserver: reading from /tmp/vgdb-pipe-from-vgdb-to-16567-by-core-on-???
==16567== embedded gdbserver: writing to   /tmp/vgdb-pipe-to-vgdb-from-16567-by-core-on-???
==16567== embedded gdbserver: shared mem   /tmp/vgdb-pipe-shared-mem-vgdb-16567-by-core-on-???
==16567==
==16567== TO CONTROL THIS PROCESS USING vgdb (which you probably
==16567== don't want to do, unless you know exactly what you're doing,
==16567== or are doing some strange experiment):
==16567==   /usr/libexec/valgrind/../../bin/vgdb --pid=16567 ...command...
==16567==
==16567== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==16567==   /path/to/gdb ./miniperl
==16567== and then give GDB the following command
==16567==   target remote | /usr/libexec/valgrind/../../bin/vgdb --pid=16567
==16567== --pid is optional if only one valgrind process is running
==16567==
--16567-- Reading syms from /usr/libexec/valgrind/vgpreload_core-amd64-linux.so
--16567--    object doesn't have a symbol table
--16567-- Reading syms from /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so
--16567--    object doesn't have a symbol table
--16567-- REDIR: 0x4053716 (libc.musl-x86_64.so.1:strlen) redirected to 0x48aa615 (strlen)
--16567-- REDIR: 0x40534ed (libc.musl-x86_64.so.1:strcpy) redirected to 0x48aa677 (strcpy)
--16567-- REDIR: 0x4053438 (libc.musl-x86_64.so.1:strchr) redirected to 0x48aa3a5 (strchr)
--16567-- REDIR: 0x4053818 (libc.musl-x86_64.so.1:strncmp) redirected to 0x48aaa27 (strncmp)
--16567-- REDIR: 0x40261b2 (libc.musl-x86_64.so.1:malloc) redirected to 0x48a55c6 (malloc)
--16567-- REDIR: 0x4026038 (libc.musl-x86_64.so.1:calloc) redirected to 0x48a971c (calloc)
==16567== Invalid read of size 8
==16567==    at 0x405685C: tss_get (in /lib/ld-musl-x86_64.so.1)
==16567==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==16567==
==16567==
==16567== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==16567==  Access not within mapped region at address 0x0
==16567==    at 0x405685C: tss_get (in /lib/ld-musl-x86_64.so.1)
==16567==  If you believe this happened as a result of a stack
==16567==  overflow in your program's main thread (unlikely but
==16567==  possible), you can try to increase the size of the
==16567==  main thread stack using the --main-stacksize= flag.
==16567==  The main thread stack size used in this run was 8388608.
==16567==
==16567== HEAP SUMMARY:
==16567==     in use at exit: 3,936 bytes in 2 blocks
==16567==   total heap usage: 2 allocs, 0 frees, 3,936 bytes allocated
==16567==
==16567== Searching for pointers to 2 not-freed blocks
==16567== Checked 63,280 bytes
==16567==
==16567== LEAK SUMMARY:
==16567==    definitely lost: 0 bytes in 0 blocks
==16567==    indirectly lost: 0 bytes in 0 blocks
==16567==      possibly lost: 0 bytes in 0 blocks
==16567==    still reachable: 3,936 bytes in 2 blocks
==16567==         suppressed: 0 bytes in 0 blocks
==16567== Rerun with --leak-check=full to see details of leaked memory
==16567==
==16567== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
==16567==
==16567== 1 errors in context 1 of 1:
==16567== Invalid read of size 8
==16567==    at 0x405685C: tss_get (in /lib/ld-musl-x86_64.so.1)
==16567==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==16567==
==16567== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
tonycoz commented 1 year ago

With symbols:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7fba3fd in tss_get () from /lib/ld-musl-x86_64.so.1
(gdb) bt
#0  0x00007ffff7fba3fd in tss_get () from /lib/ld-musl-x86_64.so.1
#1  0x00005555557dfc6d in Perl_switch_locale_context () at locale.c:7228
#2  0x00005555555b587c in S_init_tls_and_interp (my_perl=0x7ffff7f5d020)
    at /home/tony/dev/perl/git/perl/perl.c:78
#3  0x00005555555b6079 in perl_alloc ()
    at /home/tony/dev/perl/git/perl/perl.c:212
#4  0x000055555580528f in main (argc=7, argv=0x7fffffffeba8, 
    env=0x7fffffffebe8) at miniperlmain.c:105
tonycoz commented 1 year ago

I think it's happening because Perl_switch_locale_context() is fetching the TLS value before we've allocated the key (let alone set the value of the key).

From looking at the source musl appears to leave setting the TLS array for the main thread until a key is allocated which results in the null pointer access valgrind reports.

khwilliamson commented 1 year ago

Is there a way to know to skip this during such initialization?

tonycoz commented 1 year ago

I assume you mean skipping fetching the value.

We could add an extra global that indicates whether we've allocated and set the key for the main thread, but that still leaves Perl_switch_locale_context() without a context to do its work, in this case on glibc* I expect switch_locale_context() is exiting early since aTHX would be NULL.

The only real solutions I see are:

Or remove the call, since this appears to be a no-op on glibc anyway.

* glibc is doing error checking that POSIX doesn't require here

jkeenan commented 1 year ago

Since this breakage is due to a very recent core change in the current production cycle, it should be designated as a Release Blocker -- correct?

khwilliamson commented 1 year ago

Yes, but https://github.com/Perl/perl5/pull/21080 fixed this, and apparently "GH" in the description caused it to not automatically close this issue