Open p5pRT opened 10 years ago
This is a bug report for perl from prumpf@gmail.com\, generated with the help of perlbug 1.40 running under perl 5.19.11.
-----------------------------------------------------------------
Hi! I've run into a deadlock situation with the current git versions of perl (5.19.11) and glibc (2.19)\, on x86_64-pc-linux-gnu with ithreads and MY_MALLOC\, though I've run into it with other setups (recent Debian versions of Perl and glibc\, no MY_MALLOC) as well. I believe I've been able to track down the issue and come up with a workaround\, although I've not yet found the time to come up with a small reproducible test case. Please feel free to ask me for one if it's absolutely required\, though\, or ask for other information\, and I'll do my best.
In summary\, the problem is inconsistent lock ordering between Perl's PL_malloc_mutex and glibc's malloc/arena.c's list_lock. The situation arises when one thread tries to fork() at the same time that another thread calls malloc().
Perl runs pthread_atfork before the first malloc() makes glibc install its atfork handlers\, so fork() calls ptmalloc_lock_all() first\, then Perl_atfork_lock(). That means locking glibc's list_lock first\, then PL_malloc_mutex. (pthread_atfork() has LIFO semantics)
However\, Perl's malloc implementation locks PL_malloc_mutex first\, then (sometimes) runs out of memory and calls the real malloc()\, which tries to lock list_lock. We thus have a race condition and a deadlock\, which I've seen in practice.
I believe this is fundamentally a glibc bug: its implementation of pthread_atfork() behaves erratically depending on whether malloc() is first called before or after pthread_atfork(). However\, since the broken versions of glibc are out there and multiplying\, we should also work around the issue in Perl itself.
The workaround should be as easy as including an extra PerlMem_free(PerlMem_malloc(1024)) call before calling PTHREAD_ATFORK\, but gcc has started "optimizing" such (otherwise) useless calls. I've found a deliberately duplicate call to perl_alloc() works\, but that's both a one-time memory leak and horribly ugly\, and most likely breaks whatever code uses PL_do_undump.
Nevertheless\, I'll include it here\, because most of the work was probably in tracking down the bug\, and fixing it should be easier\, even if I cannot presently think of a good fix.
diff --git a/ext/ExtUtils-Miniperl/lib/ExtUtils/Miniperl.pm b/ext/ExtUtils-Miniperl/lib/ExtUtils/Miniperl.pm index 730c565..a8092bf 100644 --- a/ext/ExtUtils-Miniperl/lib/ExtUtils/Miniperl.pm +++ b/ext/ExtUtils-Miniperl/lib/ExtUtils/Miniperl.pm @@ -129\,6 +129\,19 @@ main(int argc\, char **argv\, char **env) * call PTHREAD_ATFORK() explicitly\, but if and only if it hasn't * been called at least once before in the current process. * --GSAR 2001-07-20 */ + /* There's a nasty race condition with the current versions of Perl and + * glibc: the call to PTHREAD_ATFORK in Perl's main() might be reached + * before the first malloc happens\, in which + * case fork() locks malloc/arena.c's list_lock first\, then tries to lock + * PL_malloc_lock; another thread might have locked PL_malloc_lock first\, + * then tries to lock list_lock\, resulting in a deadlock. + * + * A proper fix would be in glibc\, ensuring that ptmalloc_init() is called + * earlier\, but a workaround is to make a malloc call ourselves. */ + /* This leaks memory\, but works. */ + (void)perl_alloc(); + /* This doesn't leak memory\, but is optimized away by gcc */ + PerlMem_free(PerlMem_malloc(1024)); PTHREAD_ATFORK(Perl_atfork_lock\, Perl_atfork_unlock\, Perl_atfork_unlock);
On Sat Mar 22 09:53:21 2014\, prumpf@gmail.com wrote:
Hi! I've run into a deadlock situation with the current git versions of perl (5.19.11) and glibc (2.19)\, on x86_64-pc-linux-gnu with ithreads and MY_MALLOC\, though I've run into it with other setups (recent Debian versions of Perl and glibc\, no MY_MALLOC) as well. I believe I've been able to track down the issue and come up with a workaround\, although I've not yet found the time to come up with a small reproducible test case. Please feel free to ask me for one if it's absolutely required\, though\, or ask for other information\, and I'll do my best.
Have you reported the glibc part of the problem to your vendor (Debian?)
Since this seems to be a glibc specific issue\, I wonder if there's a glibc specific way of forcing initialization.
In any case\, the workaround would need to be protected by #ifdef __GLIBC__
https://bugzilla.redhat.com/show_bug.cgi?id=906468
seems like a different but related issue\, unfortunately his post to the glibc mailing list:
https://sourceware.org/ml/libc-alpha/2013-01/msg01051.html
seems to have been ignored.
Tony
The RT System itself - Status changed from 'new' to 'open'
On Sat\, Mar 22\, 2014 at 5:53 PM\, Philipp Rumpf \perlbug\-followup@​perl\.orgwrote:
Hi! I've run into a deadlock situation with the current git versions of perl (5.19.11) and glibc (2.19)\, on x86_64-pc-linux-gnu with ithreads and MY_MALLOC\, though I've run into it with other setups (recent Debian versions of Perl and glibc\, no MY_MALLOC) as well. I believe I've been able to track down the issue and come up with a workaround\, although I've not yet found the time to come up with a small reproducible test case. Please feel free to ask me for one if it's absolutely required\, though\, or ask for other information\, and I'll do my best.
In summary\, the problem is inconsistent lock ordering between Perl's PL_malloc_mutex and glibc's malloc/arena.c's list_lock. The situation arises when one thread tries to fork() at the same time that another thread calls malloc().
Perl runs pthread_atfork before the first malloc() makes glibc install its atfork handlers\, so fork() calls ptmalloc_lock_all() first\, then Perl_atfork_lock(). That means locking glibc's list_lock first\, then PL_malloc_mutex. (pthread_atfork() has LIFO semantics)
However\, Perl's malloc implementation locks PL_malloc_mutex first\, then (sometimes) runs out of memory and calls the real malloc()\, which tries to lock list_lock. We thus have a race condition and a deadlock\, which I've seen in practice.
I believe this is fundamentally a glibc bug: its implementation of pthread_atfork() behaves erratically depending on whether malloc() is first called before or after pthread_atfork(). However\, since the broken versions of glibc are out there and multiplying\, we should also work around the issue in Perl itself.
The workaround should be as easy as including an extra PerlMem_free(PerlMem_malloc(1024)) call before calling PTHREAD_ATFORK\, but gcc has started "optimizing" such (otherwise) useless calls. I've found a deliberately duplicate call to perl_alloc() works\, but that's both a one-time memory leak and horribly ugly\, and most likely breaks whatever code uses PL_do_undump.
Nevertheless\, I'll include it here\, because most of the work was probably in tracking down the bug\, and fixing it should be easier\, even if I cannot presently think of a good fix.
This doesn't make sense. Perl's malloc should only use the system's malloc if both USE_PERL_SBRK and PERL_SBRK_VIA_MALLOC are set\, which is not that likely. I'm not sure what's going on here exactly.
Leon
On Sat\, Mar 29\, 2014 at 3:46 PM\, Philipp Rumpf \prumpf@​gmail\.com wrote:
Hello\, I tried responding via the perlbug system\, but that appears to be broken. Thank you for your responses so far!
As a reminder\, the bug is specific to glibc/nptl-based systems with ithreads\, such as x86_64-pc-linux-gnu.
I've reported the issue on the glibc bugzilla after verifying it's not Debian-specific.
Here's a much simpler fix/workaround\, to metaconfig\, that we can use until fixed glibcs start appearing:
--------------------------------------- diff --git a/U/threads/d_pthread_atfork.U b/U/threads/d_pthread_atfork.U index 77a8b43..9f0332a 100644 --- a/U/threads/d_pthread_atfork.U +++ b/U/threads/d_pthread_atfork.U @@ -5\,7 +5\,7 @@ ?RCS: You may distribute under the terms of either the GNU General Public ?RCS: License or the Artistic License\, as specified in the README file. ?RCS: -?MAKE:d_pthread_atfork: Inlibc cat Compile usethreads Setvar +?MAKE:d_pthread_atfork: Inlibc cat Compile usethreads Setvar d_gnulibc ?MAKE: -pick add $@ %\< ?S:d_pthread_atfork: ?S: This variable conditionally defines the HAS_PTHREAD_ATFORK symbol\, @@ -37\,6 +37\,12 @@ if eval $compile; then else val="$undef" fi +case "$d_gnulibc" in +*) + echo "Assuming pthread_atfork is broken\, since this is glibc." + val="$undef" + ;; +esac case "$usethreads" in $define) case "$val" in -------------------------------------------
And here's a test case for reproducing the bug (Leon was right to point out that without -DPURIFY\, which I had set but forgotten about\, it's not Perl's malloc that calls the real malloc()\, but S_more_refcounted_fds. However\, it's the same bug).\
Yet I don't think pretending that at_fork is helpful at all. That will only create new deadlocks.
This program should terminate (and would probably exhaust file descriptors without a breakpoint)\, but by merely setting the right breakpoint and attempting to continue once it's hit\, we can get it to deadlock (after opening a mere 16 file descriptors).
------------------------------------------ #!/usr/bin/perl # set a breakpoint in S_more_refcounted_fds before running this
use threads;
async { my @fh;
for \(my $i = 0; ; $i\+\+\) \{ open\($fh\[$i\]\, "\</dev/zero"\); \}
};
sleep(1); fork(); --------------------------------------
To force the deadlock\, set a breakpoint in S_more_refcounted_fds\, then wait for a while (for the sleep(1) to finish) before continuing after the breakpoint is hit for the second time (the first time will be before the second thread is spawned).
As you can see in this rather long GDB transcript\, the bug is what I described: thread 2 is trying to malloc() with perlio_mutex held\, thread 1 is trying to fork\, is already holding glibc's malloc mutex\, and is waiting on perlio_mutex.
Yes that makes sense. I guess that means your original proposed solution (calling malloc early if necessary) is warranted.
Leon
Thanks for your response!
On Wed\, Mar 26\, 2014 at 6:47 AM\, Tony Cook via RT \<perlbug-followup@perl.org
wrote:
Have you reported the glibc part of the problem to your vendor (Debian?)
I confirmed the problem is present in the git version of glibc and reported it there: https://sourceware.org/bugzilla/show_bug.cgi?id=16742 I'll file a bug against the Debian package if I don't hear from them.
Since this seems to be a glibc specific issue\, I wonder if there's a glibc specific way of forcing initialization.
In any case\, the workaround would need to be protected by #ifdef __GLIBC__
How about simply forcing HAS_PTHREAD_ATFORK to undef if __GLIBC__ is defined? That should be a little cleaner than the malloc workaround\, at least.
Ideally\, there would be a test case to determine at configuration time whether our pthread_atfork() is broken. However\, that's a little unpredictable\, even with appropriate sleep() statements\, since our system might be too busy.
Here's what I've come up with\, as a patch against metaconfig:
https://bugzilla.redhat.com/show_bug.cgi?id=906468
seems like a different but related issue\, unfortunately his post to the glibc mailing list:
https://sourceware.org/ml/libc-alpha/2013-01/msg01051.html
seems to have been ignored.
I don't fully understand that report; it sounds like malloc_atfork() shouldn't be performing I/O\, but looking at the source it appears not to be. I suspect that the original bug might have involved pthread_atfork handlers running in the wrong order\, though; maybe fork() should call _IO_list_lock() before calling ptmalloc_lock_all()?
Anyway\, I think that's a different issue\, though it's a pity if it hasn't been fixed.
Sorry\, I hadn't noticed that PURIFY was still set in my configuration\, which does indeed set PERL_SBRK and PERL_SBRK_VIA_MALLOC. My guess is the issue doesn't appear for MYMALLOC && !PURIFY\, but it's still valid (and I'm pretty sure "what's going on here" is what I've described) for !MYMALLOC and MYMALLOC && PURIFY.
Hope that helps you make sense of it\, and sorry for the confusion.
On Wed\, Mar 26\, 2014 at 10:51 AM\, Leon Timmermans via RT \< perlbug-followup@perl.org> wrote:
On Sat\, Mar 22\, 2014 at 5:53 PM\, Philipp Rumpf \<perlbug-followup@perl.org
wrote:
Hi! I've run into a deadlock situation with the current git versions of perl (5.19.11) and glibc (2.19)\, on x86_64-pc-linux-gnu with ithreads and MY_MALLOC\, though I've run into it with other setups (recent Debian versions of Perl and glibc\, no MY_MALLOC) as well. I believe I've been able to track down the issue and come up with a workaround\, although I've not yet found the time to come up with a small reproducible test case. Please feel free to ask me for one if it's absolutely required\, though\, or ask for other information\, and I'll do my best.
In summary\, the problem is inconsistent lock ordering between Perl's PL_malloc_mutex and glibc's malloc/arena.c's list_lock. The situation arises when one thread tries to fork() at the same time that another thread calls malloc().
Perl runs pthread_atfork before the first malloc() makes glibc install its atfork handlers\, so fork() calls ptmalloc_lock_all() first\, then Perl_atfork_lock(). That means locking glibc's list_lock first\, then PL_malloc_mutex. (pthread_atfork() has LIFO semantics)
However\, Perl's malloc implementation locks PL_malloc_mutex first\, then (sometimes) runs out of memory and calls the real malloc()\, which tries to lock list_lock. We thus have a race condition and a deadlock\, which I've seen in practice.
I believe this is fundamentally a glibc bug: its implementation of pthread_atfork() behaves erratically depending on whether malloc() is first called before or after pthread_atfork(). However\, since the broken versions of glibc are out there and multiplying\, we should also work around the issue in Perl itself.
The workaround should be as easy as including an extra PerlMem_free(PerlMem_malloc(1024)) call before calling PTHREAD_ATFORK\, but gcc has started "optimizing" such (otherwise) useless calls. I've found a deliberately duplicate call to perl_alloc() works\, but that's both a one-time memory leak and horribly ugly\, and most likely breaks whatever code uses PL_do_undump.
Nevertheless\, I'll include it here\, because most of the work was probably in tracking down the bug\, and fixing it should be easier\, even if I cannot presently think of a good fix.
This doesn't make sense. Perl's malloc should only use the system's malloc if both USE_PERL_SBRK and PERL_SBRK_VIA_MALLOC are set\, which is not that likely. I'm not sure what's going on here exactly.
Leon
Hello\, I tried responding via the perlbug system\, but that appears to be broken. Thank you for your responses so far!
As a reminder\, the bug is specific to glibc/nptl-based systems with ithreads\, such as x86_64-pc-linux-gnu.
I've reported the issue on the glibc bugzilla after verifying it's not Debian-specific.
Here's a much simpler fix/workaround\, to metaconfig\, that we can use until fixed glibcs start appearing:
And here's a test case for reproducing the bug (Leon was right to point out that without -DPURIFY\, which I had set but forgotten about\, it's not Perl's malloc that calls the real malloc()\, but S_more_refcounted_fds. However\, it's the same bug). This program should terminate (and would probably exhaust file descriptors without a breakpoint)\, but by merely setting the right breakpoint and attempting to continue once it's hit\, we can get it to deadlock (after opening a mere 16 file descriptors).
#!/usr/bin/perl # set a breakpoint in S_more_refcounted_fds before running this
use threads;
async { my @fh;
for (my $i = 0; ; $i++) { open($fh[$i]\, "\</dev/zero"); } };
sleep(1); fork();
To force the deadlock\, set a breakpoint in S_more_refcounted_fds\, then wait for a while (for the sleep(1) to finish) before continuing after the breakpoint is hit for the second time (the first time will be before the second thread is spawned).
As you can see in this rather long GDB transcript\, the bug is what I described: thread 2 is trying to malloc() with perlio_mutex held\, thread 1 is trying to fork\, is already holding glibc's malloc mutex\, and is waiting on perlio_mutex.
Sorry again for the -DPURIFY confusion.
Philipp Rumpf
GDB transcript: % gdb --args perl glibc-bug.pl gdb --args perl glibc-bug.pl GNU gdb (GDB) 7.6.2 (Debian 7.6.2-1) Copyright (C) 2013 Free Software Foundation\, Inc. License GPLv3+: GNU GPL version 3 or later \<http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it. There is NO WARRANTY\, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". For bug reporting instructions\, please see: \<http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/bin/perl...Reading symbols from /usr/lib/debug/usr/bin/perl...done. done. (gdb) r r Starting program: /usr/bin/perl glibc-bug.pl warning: Could not load shared library symbols for linux-vdso.so.1. Do you need "set solib-search-path" or "set sysroot"? [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New Thread 0x7ffff6b3c700 (LWP 19617)] Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached [Thread 0x7ffff6b3c700 (LWP 19617) exited] [Inferior 1 (process 19613) exited normally] (gdb) b S_more_refcounted_fds b S_more_refcounted_fds Breakpoint 1 at 0x7ffff7b83060: file perlio.c\, line 2320. (gdb) set target-async 1 set target-async 1 (gdb) set non-stop on set non-stop on (gdb) r r Starting program: /usr/bin/perl glibc-bug.pl warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffa000 warning: Could not load shared library symbols for linux-vdso.so.1. Do you need "set solib-search-path" or "set sysroot"? [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1\, PerlIOUnix_refcnt_inc (fd=0) at perlio.c:2372 2372 perlio.c: No such file or directory. (gdb) shell sleep 5 shell sleep 5 (gdb) c c Continuing. [New Thread 0x7ffff6b3c700 (LWP 19621)]
Breakpoint 1\, PerlIOUnix_refcnt_inc (fd=16) at perlio.c:2372 2372 in perlio.c (gdb) shell sleep 5 shell sleep 5 (gdb) c c Continuing. Cannot execute this command while the selected thread is running. (gdb) i thr i thr Id Target Id Frame 2 Thread 0x7ffff6b3c700 (LWP 19621) "perl" PerlIOUnix_refcnt_inc (fd=16) at perlio.c:2372 * 1 Thread 0x7ffff7fd3700 (LWP 19619) "perl" (running) (gdb) thr 2 thr 2 [Switching to thread 2 (Thread 0x7ffff6b3c700 (LWP 19621))] #0 PerlIOUnix_refcnt_inc (fd=16) at perlio.c:2372 2372 in perlio.c (gdb) c c Continuing. C-c C-c^C Program received signal SIGINT\, Interrupt. __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95 95 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory. (gdb) interrupt -a interrupt -a (gdb) [Thread 0x7ffff7fd3700 (LWP 19619)] #1 stopped. __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 135 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) thr app all bt thr app all bt
Thread 2 (Thread 0x7ffff6b3c700 (LWP 19621)):
#0 __lll_lock_wait_private ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1 0x00007ffff6ffc527 in _L_lock_10982 () at malloc.c:5154
#2 0x00007ffff6ffa198 in __GI___libc_realloc (
oldmem=0x7ffff7321620 \<main_arena>\, bytes=128) at malloc.c:2975
#3 0x00007ffff7b83098 in S_more_refcounted_fds (my_perl=0x6bfef0\,
new_fd=16)
at perlio.c:2334
#4 PerlIOUnix_refcnt_inc (fd=16) at perlio.c:2372
#5 0x00007ffff7b839c4 in PerlIOUnix_setfd (my_perl=0x6bfef0\, f=0x6d8710\,
imode=0\, fd=\
Thread 1 (Thread 0x7ffff7fd3700 (LWP 19619)): #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007ffff7331467 in _L_lock_913 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007ffff7331290 in __GI___pthread_mutex_lock ( mutex=0x7ffff7ddce20 \<PL_perlio_mutex>) at ../nptl/pthread_mutex_lock.c:79 #3 0x00007ffff7ae9c70 in Perl_atfork_lock () at util.c:2811 #4 0x00007ffff7035122 in __libc_fork () at ../nptl/sysdeps/unix/sysv/linux/x86_64/../fork.c:95 #5 0x00007ffff7338305 in __fork () at ../nptl/sysdeps/unix/sysv/linux/pt-fork.c:25 #6 0x00007ffff7ae9d05 in Perl_my_fork () at util.c:2849 #7 0x00007ffff7b556bc in Perl_pp_fork (my_perl=0x603010) at pp_sys.c:4022 #8 0x00007ffff7b05326 in Perl_runops_standard (my_perl=0x603010) at run.c:42 #9 0x00007ffff7a9dce4 in S_run_body (oldscope=1\, my_perl=0x603010) at perl.c:2467 #10 perl_run (my_perl=0x603010) at perl.c:2383 #11 0x0000000000400e19 in main (argc=2\, argv=0x7fffffffeaf8\, env=0x7fffffffeb10) at perlmain.c:114
On Sat\, 29 Mar 2014 14:46:08 +0000\, Philipp Rumpf \prumpf@​gmail\.com wrote:
Hello\, I tried responding via the perlbug system\, but that appears to be broken. Thank you for your responses so far!
As a reminder\, the bug is specific to glibc/nptl-based systems with ithreads\, such as x86_64-pc-linux-gnu.
I admire the fact that this is a genuine patch to the meta-system\, but looking at the scope\, I wonder if it better is located in hints/linux.sh
I've reported the issue on the glibc bugzilla after verifying it's not Debian-specific.
Here's a much simpler fix/workaround\, to metaconfig\, that we can use until fixed glibcs start appearing:
--------------------------------------- diff --git a/U/threads/d_pthread_atfork.U b/U/threads/d_pthread_atfork.U index 77a8b43..9f0332a 100644 --- a/U/threads/d_pthread_atfork.U +++ b/U/threads/d_pthread_atfork.U @@ -5\,7 +5\,7 @@ ?RCS: You may distribute under the terms of either the GNU General Public ?RCS: License or the Artistic License\, as specified in the README file. ?RCS: -?MAKE:d_pthread_atfork: Inlibc cat Compile usethreads Setvar +?MAKE:d_pthread_atfork: Inlibc cat Compile usethreads Setvar d_gnulibc ?MAKE: -pick add $@ %\< ?S:d_pthread_atfork: ?S: This variable conditionally defines the HAS_PTHREAD_ATFORK symbol\, @@ -37\,6 +37\,12 @@ if eval $compile; then else val="$undef" fi +case "$d_gnulibc" in +*) + echo "Assuming pthread_atfork is broken\, since this is glibc." + val="$undef" + ;; +esac case "$usethreads" in $define) case "$val" in -------------------------------------------
And here's a test case for reproducing the bug (Leon was right to point out that without -DPURIFY\, which I had set but forgotten about\, it's not Perl's malloc that calls the real malloc()\, but S_more_refcounted_fds. However\, it's the same bug). This program should terminate (and would probably exhaust file descriptors without a breakpoint)\, but by merely setting the right breakpoint and attempting to continue once it's hit\, we can get it to deadlock (after opening a mere 16 file descriptors).
------------------------------------------ #!/usr/bin/perl # set a breakpoint in S_more_refcounted_fds before running this
use threads;
async { my @fh;
for \(my $i = 0; ; $i\+\+\) \{ open\($fh\[$i\]\, "\</dev/zero"\); \}
};
sleep(1); fork(); --------------------------------------
To force the deadlock\, set a breakpoint in S_more_refcounted_fds\, then wait for a while (for the sleep(1) to finish) before continuing after the breakpoint is hit for the second time (the first time will be before the second thread is spawned).
As you can see in this rather long GDB transcript\, the bug is what I described: thread 2 is trying to malloc() with perlio_mutex held\, thread 1 is trying to fork\, is already holding glibc's malloc mutex\, and is waiting on perlio_mutex.
Sorry again for the -DPURIFY confusion.
Philipp Rumpf
-------------------------------------- GDB transcript: % gdb --args perl glibc-bug.pl gdb --args perl glibc-bug.pl GNU gdb (GDB) 7.6.2 (Debian 7.6.2-1) Copyright (C) 2013 Free Software Foundation\, Inc. License GPLv3+: GNU GPL version 3 or later \<http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it. There is NO WARRANTY\, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". For bug reporting instructions\, please see: \<http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/bin/perl...Reading symbols from /usr/lib/debug/usr/bin/perl...done. done. (gdb) r r Starting program: /usr/bin/perl glibc-bug.pl warning: Could not load shared library symbols for linux-vdso.so.1. Do you need "set solib-search-path" or "set sysroot"? [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New Thread 0x7ffff6b3c700 (LWP 19617)] Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached [Thread 0x7ffff6b3c700 (LWP 19617) exited] [Inferior 1 (process 19613) exited normally] (gdb) b S_more_refcounted_fds b S_more_refcounted_fds Breakpoint 1 at 0x7ffff7b83060: file perlio.c\, line 2320. (gdb) set target-async 1 set target-async 1 (gdb) set non-stop on set non-stop on (gdb) r r Starting program: /usr/bin/perl glibc-bug.pl warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffa000 warning: Could not load shared library symbols for linux-vdso.so.1. Do you need "set solib-search-path" or "set sysroot"? [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1\, PerlIOUnix_refcnt_inc (fd=0) at perlio.c:2372 2372 perlio.c: No such file or directory. (gdb) shell sleep 5 shell sleep 5 (gdb) c c Continuing. [New Thread 0x7ffff6b3c700 (LWP 19621)]
Breakpoint 1\, PerlIOUnix_refcnt_inc (fd=16) at perlio.c:2372 2372 in perlio.c (gdb) shell sleep 5 shell sleep 5 (gdb) c c Continuing. Cannot execute this command while the selected thread is running. (gdb) i thr i thr Id Target Id Frame 2 Thread 0x7ffff6b3c700 (LWP 19621) "perl" PerlIOUnix_refcnt_inc (fd=16) at perlio.c:2372 * 1 Thread 0x7ffff7fd3700 (LWP 19619) "perl" (running) (gdb) thr 2 thr 2 [Switching to thread 2 (Thread 0x7ffff6b3c700 (LWP 19621))] #0 PerlIOUnix_refcnt_inc (fd=16) at perlio.c:2372 2372 in perlio.c (gdb) c c Continuing. C-c C-c^C Program received signal SIGINT\, Interrupt. __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95 95 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory. (gdb) interrupt -a interrupt -a (gdb) [Thread 0x7ffff7fd3700 (LWP 19619)] #1 stopped. __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 135 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) thr app all bt thr app all bt
Thread 2 (Thread 0x7ffff6b3c700 (LWP 19621)): #0 __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95 #1 0x00007ffff6ffc527 in _L_lock_10982 () at malloc.c:5154 #2 0x00007ffff6ffa198 in __GI___libc_realloc ( oldmem=0x7ffff7321620 \<main_arena>\, bytes=128) at malloc.c:2975 #3 0x00007ffff7b83098 in S_more_refcounted_fds (my_perl=0x6bfef0\, new_fd=16) at perlio.c:2334 #4 PerlIOUnix_refcnt_inc (fd=16) at perlio.c:2372 #5 0x00007ffff7b839c4 in PerlIOUnix_setfd (my_perl=0x6bfef0\, f=0x6d8710\, imode=0\, fd=\
) at perlio.c:2655 #6 PerlIOUnix_open (my_perl=0x6bfef0\, self=0x7ffff7ddc820 \<PerlIO_unix>\, layers=0x6d84b0\, n=0\, mode=0x7ffff6b3ba70 "r"\, fd=\ \, imode=0\, perm=438\, f=0x6d8710\, narg=1\, args=0x7ffff6b3ba68) at perlio.c:2736 #7 0x00007ffff7b82c06 in PerlIOBuf_open (my_perl=0x6bfef0\, self=0x7ffff7ddc660 \<PerlIO_perlio>\, layers=0x6d84b0\, n=1\, mode=0x7ffff6b3ba70 "r"\, fd=-1\, imode=0\, perm=0\, f=0x0\, narg=1\, args=0x7ffff6b3ba68) at perlio.c:3862 #8 0x00007ffff7b84b2b in PerlIO_openn (my_perl=my_perl@entry=0x6bfef0\, layers=layers@entry=0x0\, mode=mode@entry=0x7ffff6b3ba70 "r"\, fd=fd@entry=-1\, imode=imode@entry=0\, perm=perm@entry=0\, f=f@entry=0x0\, narg=narg@entry=1\, args=args@entry=0x7ffff6b3ba68) at perlio.c:1648 #9 0x00007ffff7b5d83e in Perl_do_openn (my_perl=my_perl@entry=0x6bfef0\, gv=gv@entry=0x7362f8\, oname=0x724830 "\</dev/zero"\, len=\ \, as_raw=as_raw@entry=0\, rawmode=rawmode@entry=0\, rawperm=rawperm@entry=0\, supplied_fp=supplied_fp@entry=0x0\, svp=0x7ffff6b3ba68\, num_svs=1\, num_svs@entry=0) at doio.c:453 #10 0x00007ffff7b4c36e in Perl_pp_open (my_perl=0x6bfef0) at pp_sys.c:640 #11 0x00007ffff7b05326 in Perl_runops_standard (my_perl=0x6bfef0) at run.c:42 #12 0x00007ffff7a96930 in Perl_call_sv (my_perl=my_perl@entry=0x6bfef0\, sv=0x736058\, flags=\ ) at perl.c:2766 #13 0x00007ffff6b43589 in S_ithread_run (arg=0x630020) at threads.xs:517 #14 0x00007ffff732f062 in start_thread (arg=0x7ffff6b3c700) at pthread_create.c:312 #15 0x00007ffff7063a3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 1 (Thread 0x7ffff7fd3700 (LWP 19619)): #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007ffff7331467 in _L_lock_913 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007ffff7331290 in __GI___pthread_mutex_lock ( mutex=0x7ffff7ddce20 \<PL_perlio_mutex>) at ../nptl/pthread_mutex_lock.c:79 #3 0x00007ffff7ae9c70 in Perl_atfork_lock () at util.c:2811 #4 0x00007ffff7035122 in __libc_fork () at ../nptl/sysdeps/unix/sysv/linux/x86_64/../fork.c:95 #5 0x00007ffff7338305 in __fork () at ../nptl/sysdeps/unix/sysv/linux/pt-fork.c:25 #6 0x00007ffff7ae9d05 in Perl_my_fork () at util.c:2849 #7 0x00007ffff7b556bc in Perl_pp_fork (my_perl=0x603010) at pp_sys.c:4022 #8 0x00007ffff7b05326 in Perl_runops_standard (my_perl=0x603010) at run.c:42 #9 0x00007ffff7a9dce4 in S_run_body (oldscope=1\, my_perl=0x603010) at perl.c:2467 #10 perl_run (my_perl=0x603010) at perl.c:2383 #11 0x0000000000400e19 in main (argc=2\, argv=0x7fffffffeaf8\, env=0x7fffffffeb10) at perlmain.c:114 ----------------------------------------------
-- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.19 porting perl5 on HP-UX\, AIX\, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
On Mon\, Mar 31\, 2014 at 6:28 AM\, H. Merijn Brand via RT \< perlbug-followup@perl.org> wrote:
I admire the fact that this is a genuine patch to the meta-system\, but looking at the scope\, I wonder if it better is located in hints/linux.sh
I don't know. The build system is a bit of a mystery to me (I'm not sure\, but I think the first patch was broken in the non-glibc case).
There are four options here: put the test in metaconfig or the hints file\, and use version number testing or a test program. Testing by version numbers seems to be discouraged\, and while I have a test program\, the only easy way to tell whether it deadlocked is to wait for a timeout. I'm paranoid about that reporting false failures on very busy systems with fixed glibcs. In the failure case\, it also incurs a delay on the build system while it waits for the timeoutâI chose two seconds\, we could probably get away with one second.
I'd argue that the code with the test program might well go into metaconfig: pthread_atfork() is broken for all users\, not just Perl. The test isn't specific to glibc or linuxâit should work on all POSIX systems\, and if it fails on a non-glibc system we definitely don't want to use pthread_atfork() there.
So I've attached the two test-program-based versions\, as patches to metaconfig and perl. Either one appears to work\, and installing both also appears to work.
Philipp
I've reported the issue on the glibc bugzilla after verifying it's not Debian-specific.
Here's a much simpler fix/workaround\, to metaconfig\, that we can use until fixed glibcs start appearing:
--------------------------------------- diff --git a/U/threads/d_pthread_atfork.U b/U/threads/d_pthread_atfork.U index 77a8b43..9f0332a 100644 --- a/U/threads/d_pthread_atfork.U +++ b/U/threads/d_pthread_atfork.U @@ -5\,7 +5\,7 @@ ?RCS: You may distribute under the terms of either the GNU General Public ?RCS: License or the Artistic License\, as specified in the README file. ?RCS: -?MAKE:d_pthread_atfork: Inlibc cat Compile usethreads Setvar +?MAKE:d_pthread_atfork: Inlibc cat Compile usethreads Setvar d_gnulibc ?MAKE: -pick add $@ %\< ?S:d_pthread_atfork: ?S: This variable conditionally defines the HAS_PTHREAD_ATFORK symbol\, @@ -37\,6 +37\,12 @@ if eval $compile; then else val="$undef" fi +case "$d_gnulibc" in +*) + echo "Assuming pthread_atfork is broken\, since this is glibc." + val="$undef" + ;; +esac case "$usethreads" in $define) case "$val" in -------------------------------------------
And here's a test case for reproducing the bug (Leon was right to point out that without -DPURIFY\, which I had set but forgotten about\, it's not Perl's malloc that calls the real malloc()\, but S_more_refcounted_fds. However\, it's the same bug). This program should terminate (and would probably exhaust file descriptors without a breakpoint)\, but by merely setting the right breakpoint and attempting to continue once it's hit\, we can get it to deadlock (after opening a mere 16 file descriptors).
------------------------------------------ #!/usr/bin/perl # set a breakpoint in S_more_refcounted_fds before running this
use threads;
async { my @fh;
for \(my $i = 0; ; $i\+\+\) \{ open\($fh\[$i\]\, "\</dev/zero"\); \}
};
sleep(1); fork(); --------------------------------------
To force the deadlock\, set a breakpoint in S_more_refcounted_fds\, then wait for a while (for the sleep(1) to finish) before continuing after the breakpoint is hit for the second time (the first time will be before the second thread is spawned).
As you can see in this rather long GDB transcript\, the bug is what I described: thread 2 is trying to malloc() with perlio_mutex held\, thread 1 is trying to fork\, is already holding glibc's malloc mutex\, and is waiting on perlio_mutex.
Sorry again for the -DPURIFY confusion.
Philipp Rumpf
-------------------------------------- GDB transcript: % gdb --args perl glibc-bug.pl gdb --args perl glibc-bug.pl GNU gdb (GDB) 7.6.2 (Debian 7.6.2-1) Copyright (C) 2013 Free Software Foundation\, Inc. License GPLv3+: GNU GPL version 3 or later \< http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it. There is NO WARRANTY\, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". For bug reporting instructions\, please see: \<http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/bin/perl...Reading symbols from /usr/lib/debug/usr/bin/perl...done. done. (gdb) r r Starting program: /usr/bin/perl glibc-bug.pl warning: Could not load shared library symbols for linux-vdso.so.1. Do you need "set solib-search-path" or "set sysroot"? [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New Thread 0x7ffff6b3c700 (LWP 19617)] Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached [Thread 0x7ffff6b3c700 (LWP 19617) exited] [Inferior 1 (process 19613) exited normally] (gdb) b S_more_refcounted_fds b S_more_refcounted_fds Breakpoint 1 at 0x7ffff7b83060: file perlio.c\, line 2320. (gdb) set target-async 1 set target-async 1 (gdb) set non-stop on set non-stop on (gdb) r r Starting program: /usr/bin/perl glibc-bug.pl warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffa000 warning: Could not load shared library symbols for linux-vdso.so.1. Do you need "set solib-search-path" or "set sysroot"? [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1\, PerlIOUnix_refcnt_inc (fd=0) at perlio.c:2372 2372 perlio.c: No such file or directory. (gdb) shell sleep 5 shell sleep 5 (gdb) c c Continuing. [New Thread 0x7ffff6b3c700 (LWP 19621)]
Breakpoint 1\, PerlIOUnix_refcnt_inc (fd=16) at perlio.c:2372 2372 in perlio.c (gdb) shell sleep 5 shell sleep 5 (gdb) c c Continuing. Cannot execute this command while the selected thread is running. (gdb) i thr i thr Id Target Id Frame 2 Thread 0x7ffff6b3c700 (LWP 19621) "perl" PerlIOUnix_refcnt_inc (fd=16) at perlio.c:2372 * 1 Thread 0x7ffff7fd3700 (LWP 19619) "perl" (running) (gdb) thr 2 thr 2 [Switching to thread 2 (Thread 0x7ffff6b3c700 (LWP 19621))] #0 PerlIOUnix_refcnt_inc (fd=16) at perlio.c:2372 2372 in perlio.c (gdb) c c Continuing. C-c C-c^C Program received signal SIGINT\, Interrupt. __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95 95 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory. (gdb) interrupt -a interrupt -a (gdb) [Thread 0x7ffff7fd3700 (LWP 19619)] #1 stopped. __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 135 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) thr app all bt thr app all bt
Thread 2 (Thread 0x7ffff6b3c700 (LWP 19621)): #0 __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95 #1 0x00007ffff6ffc527 in _L_lock_10982 () at malloc.c:5154 #2 0x00007ffff6ffa198 in __GI___libc_realloc ( oldmem=0x7ffff7321620 \<main_arena>\, bytes=128) at malloc.c:2975 #3 0x00007ffff7b83098 in S_more_refcounted_fds (my_perl=0x6bfef0\, new_fd=16) at perlio.c:2334 #4 PerlIOUnix_refcnt_inc (fd=16) at perlio.c:2372 #5 0x00007ffff7b839c4 in PerlIOUnix_setfd (my_perl=0x6bfef0\, f=0x6d8710\, imode=0\, fd=\
) at perlio.c:2655 #6 PerlIOUnix_open (my_perl=0x6bfef0\, self=0x7ffff7ddc820 \<PerlIO_unix>\, layers=0x6d84b0\, n=0\, mode=0x7ffff6b3ba70 "r"\, fd=\ \, imode=0\, perm=438\, f=0x6d8710\, narg=1\, args=0x7ffff6b3ba68) at perlio.c:2736 #7 0x00007ffff7b82c06 in PerlIOBuf_open (my_perl=0x6bfef0\, self=0x7ffff7ddc660 \<PerlIO_perlio>\, layers=0x6d84b0\, n=1\, mode=0x7ffff6b3ba70 "r"\, fd=-1\, imode=0\, perm=0\, f=0x0\, narg=1\, args=0x7ffff6b3ba68) at perlio.c:3862 #8 0x00007ffff7b84b2b in PerlIO_openn (my_perl=my_perl@entry=0x6bfef0\, layers=layers@entry=0x0\, mode=mode@entry=0x7ffff6b3ba70 "r"\, fd=fd@entry=-1\, imode=imode@entry=0\, perm=perm@entry=0\, f=f@entry =0x0\, narg=narg@entry=1\, args=args@entry=0x7ffff6b3ba68) at perlio.c:1648 #9 0x00007ffff7b5d83e in Perl_do_openn (my_perl=my_perl@entry=0x6bfef0\, gv=gv@entry=0x7362f8\, oname=0x724830 "\</dev/zero"\, len=\<optimized out>\, as_raw=as_raw@entry=0\, rawmode=rawmode@entry=0\, rawperm=rawperm@entry=0\, supplied_fp=supplied_fp@entry=0x0\, svp=0x7ffff6b3ba68\, num_svs=1\, num_svs@entry=0) at doio.c:453 #10 0x00007ffff7b4c36e in Perl_pp_open (my_perl=0x6bfef0) at pp_sys.c:640 #11 0x00007ffff7b05326 in Perl_runops_standard (my_perl=0x6bfef0) at run.c:42 #12 0x00007ffff7a96930 in Perl_call_sv (my_perl=my_perl@entry=0x6bfef0\, sv=0x736058\, flags=\ ) at perl.c:2766 #13 0x00007ffff6b43589 in S_ithread_run (arg=0x630020) at threads.xs:517 #14 0x00007ffff732f062 in start_thread (arg=0x7ffff6b3c700) at pthread_create.c:312 #15 0x00007ffff7063a3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 1 (Thread 0x7ffff7fd3700 (LWP 19619)): #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007ffff7331467 in _L_lock_913 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007ffff7331290 in __GI___pthread_mutex_lock ( mutex=0x7ffff7ddce20 \<PL_perlio_mutex>) at ../nptl/pthread_mutex_lock.c:79 #3 0x00007ffff7ae9c70 in Perl_atfork_lock () at util.c:2811 #4 0x00007ffff7035122 in __libc_fork () at ../nptl/sysdeps/unix/sysv/linux/x86_64/../fork.c:95 #5 0x00007ffff7338305 in __fork () at ../nptl/sysdeps/unix/sysv/linux/pt-fork.c:25 #6 0x00007ffff7ae9d05 in Perl_my_fork () at util.c:2849 #7 0x00007ffff7b556bc in Perl_pp_fork (my_perl=0x603010) at pp_sys.c:4022 #8 0x00007ffff7b05326 in Perl_runops_standard (my_perl=0x603010) at run.c:42 #9 0x00007ffff7a9dce4 in S_run_body (oldscope=1\, my_perl=0x603010) at perl.c:2467 #10 perl_run (my_perl=0x603010) at perl.c:2383 #11 0x0000000000400e19 in main (argc=2\, argv=0x7fffffffeaf8\, env=0x7fffffffeb10) at perlmain.c:114 ----------------------------------------------
-- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.19 porting perl5 on HP-UX\, AIX\, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
On Tue\, Apr 1\, 2014 at 5:12 PM\, Philipp Rumpf \prumpf@​gmail\.com wrote:
On Mon\, Mar 31\, 2014 at 6:28 AM\, H. Merijn Brand via RT \< perlbug-followup@perl.org> wrote:
I admire the fact that this is a genuine patch to the meta-system\, but looking at the scope\, I wonder if it better is located in hints/linux.sh
I don't know. The build system is a bit of a mystery to me (I'm not sure\, but I think the first patch was broken in the non-glibc case).
There are four options here: put the test in metaconfig or the hints file\, and use version number testing or a test program. Testing by version numbers seems to be discouraged\, and while I have a test program\, the only easy way to tell whether it deadlocked is to wait for a timeout. I'm paranoid about that reporting false failures on very busy systems with fixed glibcs. In the failure case\, it also incurs a delay on the build system while it waits for the timeoutâI chose two seconds\, we could probably get away with one second.
I'd argue that the code with the test program might well go into metaconfig: pthread_atfork() is broken for all users\, not just Perl. The test isn't specific to glibc or linuxâit should work on all POSIX systems\, and if it fails on a non-glibc system we definitely don't want to use pthread_atfork() there.
So I've attached the two test-program-based versions\, as patches to metaconfig and perl. Either one appears to work\, and installing both also appears to work.
But you now introduced exactly the deadlock that the use of pthread_at_fork was supposed to fix: if thread 1 forks while thread 2 holds a perl mutex\, the new process will deadlock as soon as it tries to acquire that mutex.
This is not a solution in any way.
Leon
If HAS_PTHREAD_ATFORK is undefined\, Perl_my_fork() calls the same handlers that would otherwise have been installed by pthread_atfork(). So the deadlock you describe would only happen if someone called fork() (the C function\, not the Perl function) directly\, rather than going through Perl_my_fork(). Is that the case you're worrying about?
If the malloc hack is considered the better workaround\, we can do that\, of course.
On Thu\, Apr 3\, 2014 at 5:48 PM\, Leon Timmermans via RT \< perlbug-followup@perl.org> wrote:
On Tue\, Apr 1\, 2014 at 5:12 PM\, Philipp Rumpf \prumpf@​gmail\.com wrote:
On Mon\, Mar 31\, 2014 at 6:28 AM\, H. Merijn Brand via RT \< perlbug-followup@perl.org> wrote:
I admire the fact that this is a genuine patch to the meta-system\, but looking at the scope\, I wonder if it better is located in hints/linux.sh
I don't know. The build system is a bit of a mystery to me (I'm not sure\, but I think the first patch was broken in the non-glibc case).
There are four options here: put the test in metaconfig or the hints file\, and use version number testing or a test program. Testing by version numbers seems to be discouraged\, and while I have a test program\, the only easy way to tell whether it deadlocked is to wait for a timeout. I'm paranoid about that reporting false failures on very busy systems with fixed glibcs. In the failure case\, it also incurs a delay on the build system while it waits for the timeoutâI chose two seconds\, we could probably get away with one second.
I'd argue that the code with the test program might well go into metaconfig: pthread_atfork() is broken for all users\, not just Perl. The test isn't specific to glibc or linuxâit should work on all POSIX systems\, and if it fails on a non-glibc system we definitely don't want to use pthread_atfork() there.
So I've attached the two test-program-based versions\, as patches to metaconfig and perl. Either one appears to work\, and installing both also appears to work.
But you now introduced exactly the deadlock that the use of pthread_at_fork was supposed to fix: if thread 1 forks while thread 2 holds a perl mutex\, the new process will deadlock as soon as it tries to acquire that mutex.
This is not a solution in any way.
Leon
If it's possible to add a configuration variable in hints/linux.sh\, I haven't figured out how. So here's the version that changes metaconfig\, but uses the malloc() hack. Applications that embed Perl are likely to copy-and-paste the code that calls PTHREAD_ATFORK\, so I've exported the Perl_atfork_fix symbol; they're very likely not to need the workaround\, anyway.
On Thu\, Apr 3\, 2014 at 10:15 PM\, Philipp Rumpf \prumpf@​gmail\.com wrote:
If HAS_PTHREAD_ATFORK is undefined\, Perl_my_fork() calls the same handlers that would otherwise have been installed by pthread_atfork(). So the deadlock you describe would only happen if someone called fork() (the C function\, not the Perl function) directly\, rather than going through Perl_my_fork(). Is that the case you're worrying about?
If the malloc hack is considered the better workaround\, we can do that\, of course.
On Thu\, Apr 3\, 2014 at 5:48 PM\, Leon Timmermans via RT \< perlbug-followup@perl.org> wrote:
On Tue\, Apr 1\, 2014 at 5:12 PM\, Philipp Rumpf \prumpf@​gmail\.com wrote:
On Mon\, Mar 31\, 2014 at 6:28 AM\, H. Merijn Brand via RT \< perlbug-followup@perl.org> wrote:
I admire the fact that this is a genuine patch to the meta-system\, but looking at the scope\, I wonder if it better is located in hints/linux.sh
I don't know. The build system is a bit of a mystery to me (I'm not sure\, but I think the first patch was broken in the non-glibc case).
There are four options here: put the test in metaconfig or the hints file\, and use version number testing or a test program. Testing by version numbers seems to be discouraged\, and while I have a test program\, the only easy way to tell whether it deadlocked is to wait for a timeout. I'm paranoid about that reporting false failures on very busy systems with fixed glibcs. In the failure case\, it also incurs a delay on the build system while it waits for the timeoutâI chose two seconds\, we could probably get away with one second.
I'd argue that the code with the test program might well go into metaconfig: pthread_atfork() is broken for all users\, not just Perl. The test isn't specific to glibc or linuxâit should work on all POSIX systems\, and if it fails on a non-glibc system we definitely don't want to use pthread_atfork() there.
So I've attached the two test-program-based versions\, as patches to metaconfig and perl. Either one appears to work\, and installing both also appears to work.
But you now introduced exactly the deadlock that the use of pthread_at_fork was supposed to fix: if thread 1 forks while thread 2 holds a perl mutex\, the new process will deadlock as soon as it tries to acquire that mutex.
This is not a solution in any way.
Leon
Migrated from rt.perl.org#121490 (status was 'open')
Searchable as RT121490$