Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.92k stars 551 forks source link

Error message "Signal SIGCHLD received, but no signal handler set." #17662

Open gflohr opened 4 years ago

gflohr commented 4 years ago

Module:

Description

In a web server based on Net::Server (version 2.007 and the most recent 2.009) with the Net::Server::Single personality, I get the error "Signal SIGCHLD received, but no signal handler set." and the server terminates. When using the Net::Server::PreFork personality, the signal is SIGHUP instead of SIGCHLD, and the server continues to run because only one of the forked children had terminated.

The error occurs after a handler of the server has forked a background process. In the child, POSIX::setsid() is called, all open file descriptors are closed and STDIN is redirected to /dev/null. STDOUT and STDERR are not re-opened because the same code is supposed to work in a mod_perl environment. %SIG is not modified. The child process (another Perl script) is executed with exec().

There is a corresponding question on stackoverflow.com: https://stackoverflow.com/questions/60708194/error-message-signal-sigchld-received-but-no-signal-handler-set/60761593#60761593

I have answered it myself with additional findings/information: https://stackoverflow.com/a/60761593/5464233

My workaround is to explicitely assign "DEFAULT" to $SIG{CHLD} resp. $SIG{HUP}, instead of the old value undef but that is not a satisfying solution. That is very similar to the workaround that was recommended for an almost identical problem in GNU parallel several years ago: https://lists.gnu.org/archive/html/bug-parallel/2016-10/msg00000.html

Steps to Reproduce

Not possible. It's a (proprietary) application with 350000+ lines of Perl code. Modifying the code or running the application in the Perl debugger makes the error disappear instantly. If there is interest in trying to debug the problem, I can provide access to a test installation.

I also wasn't able to run the program in gdb because the server did not come up.

Expected behavior

The script should not be able to make the interpreter terminate prematurely.

Perl configuration

# perl -V output goes here (a little below ...)

I have tested on Mac OS X with these Perl versions using perlbrew:

$ perlbrew list
  perl-5.30.2                               
* perl-5.28.0                               
  perl-5.26.2                               
  perl-5.26.0                               
  perl-5.18.4                               
  perl-5.16.3                               
  perl-5.14.4                               
  perl-5.8.9                                

All versions starting from 5.18.4 have the problem. The older ones do not have it.

The same behavior was reported for 5.26.x to 5.30.x on Linux.

 $ perl -V
Summary of my perl5 (revision 5 version 28 subversion 0) configuration:

  Platform:
    osname=darwin
    osvers=18.0.0
    archname=darwin-2level
    uname='darwin hostname.example.com 18.0.0 darwin kernel version 18.0.0: wed aug 22 20:13:40 pdt 2018; root:xnu-4903.201.2~1release_x86_64 x86_64 '
    config_args='-de -Dprefix=/Users/myname/perl5/perlbrew/perls/perl-5.28.0 -Aeval:scriptdir=/Users/myname/perl5/perlbrew/perls/perl-5.28.0/bin'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=undef
    usemultiplicity=undef
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='cc'
    ccflags ='-fno-common -DPERL_DARWIN -mmacosx-version-min=10.14 -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -I/opt/local/include -DPERL_USE_SAFE_PUTENV'
    optimize='-O3'
    cppflags='-fno-common -DPERL_DARWIN -mmacosx-version-min=10.14 -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -I/opt/local/include'
    ccversion=''
    gccversion='4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags =' -mmacosx-version-min=10.14 -fstack-protector-strong -L/usr/local/lib -L/opt/local/lib'
    libpth=/usr/local/lib /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/10.0.0/lib /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/lib /opt/local/lib /usr/lib
    libs=-lpthread -lgdbm -ldbm -ldl -lm -lutil -lc
    perllibs=-lpthread -ldl -lm -lutil -lc
    libc=
    so=dylib
    useshrplib=false
    libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=bundle
    d_dlsymun=undef
    ccdlflags=' '
    cccdlflags=' '
    lddlflags=' -mmacosx-version-min=10.14 -bundle -undefined dynamic_lookup -L/usr/local/lib -L/opt/local/lib -fstack-protector-strong'

Characteristics of this binary (from libperl): 
  Compile-time options:
    HAS_TIMES
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_MALLOC_WRAP
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    PERL_USE_SAFE_PUTENV
    USE_64_BIT_ALL
    USE_64_BIT_INT
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
  Locally applied patches:
    Devel::PatchPerl 1.52
  Built under darwin
  Compiled at Nov  9 2018 16:47:19
  %ENV:
    PERLBREW_HOME="/Users/myname/.perlbrew"
    PERLBREW_MANPATH="/Users/myname/perl5/perlbrew/perls/perl-5.28.0/man"
    PERLBREW_PATH="/Users/myname/perl5/perlbrew/bin:/Users/myname/perl5/perlbrew/perls/perl-5.28.0/bin"
    PERLBREW_PERL="perl-5.28.0"
    PERLBREW_ROOT="/Users/myname/perl5/perlbrew"
    PERLBREW_SHELLRC_VERSION="0.87"
    PERLBREW_VERSION="0.87"
  @INC:
    /Users/myname/perl5/perlbrew/perls/perl-5.28.0/lib/site_perl/5.28.0/darwin-2level
    /Users/myname/perl5/perlbrew/perls/perl-5.28.0/lib/site_perl/5.28.0
    /Users/myname/perl5/perlbrew/perls/perl-5.28.0/lib/5.28.0/darwin-2level
    /Users/myname/perl5/perlbrew/perls/perl-5.28.0/lib/5.28.0
gflohr commented 4 years ago

This is the code snippet that spawns the child process:

            if ($self->{__daemonize}) {
                if (fork) {
                    CORE::exit(0);
                }

                unless (POSIX::setsid()) {
                    # make child session leader and detach from terminal
                    die "Unable to detach process\n";
                }

                ## Close open file descriptors.
                POSIX::close($_) foreach open_files;

                ## Reopen stdin to /dev/null
                open(STDIN,  "+>/dev/null")
                     or warn "Cannot redirect standard input to /dev/null: $!\n";

                # XXX
                # Re-opening STDOUT colides with it being tie-ed during the request.
                # open(STDOUT,  "+>/dev/null")
                #     or warn "Cannot redirect standard output to /dev/null: $!\n";
                # Re-opening STDERR colides with mod_perl.
                # open(STDERR,  "+>/dev/null")
                #      or warn "Cannot redirect standard error to /dev/null: $!\n";
            }

            exec($self->{__path}, @{$self->{__argv}})
                or die "Exec failed: " . $!;

The function open_files() returns a list of open file descriptors $self->{__path} is the path to the script to be executed, $self->{__argv} is the list of arguments.

jkeenan commented 4 years ago

This is the code snippet that spawns the child process:

            if ($self->{__daemonize}) {
                if (fork) {
                    CORE::exit(0);
                }

                unless (POSIX::setsid()) {
                    # make child session leader and detach from terminal
                    die "Unable to detach process\n";
                }

                ## Close open file descriptors.
                POSIX::close($_) foreach open_files;

                ## Reopen stdin to /dev/null
                open(STDIN,  "+>/dev/null")
                     or warn "Cannot redirect standard input to /dev/null: $!\n";

                # XXX
                # Re-opening STDOUT colides with it being tie-ed during the request.
                # open(STDOUT,  "+>/dev/null")
                #     or warn "Cannot redirect standard output to /dev/null: $!\n";
                # Re-opening STDERR colides with mod_perl.
                # open(STDERR,  "+>/dev/null")
                #      or warn "Cannot redirect standard error to /dev/null: $!\n";
            }

            exec($self->{__path}, @{$self->{__argv}})
                or die "Exec failed: " . $!;

The function open_files() returns a list of open file descriptors $self->{__path} is the path to the script to be executed, $self->{__argv} is the list of arguments.

Is the code snippet from your own code -- or from perl or CPAN code?

gflohr commented 4 years ago

The code snippet is not from a CPAN module but from the application (which is not on CPAN).

iabyn commented 4 years ago

It sounds like somehow there is a discrepancy between what the OS and perl thinks about the state of the signal handdler. The OS still thinks that perl will be handling the signal (rather than e.g. the OS ignoring it or whatever), but when the OS calls perl's C-level signal handler, perl thinks there is no signal handler present in %SIG so dies with the error you're seeing.

Normally when %SIG is modified, perl itself updates both the O'sS and perl's view of that signal handler. For them to get out of sync, I would suspect that their is some part of your code or a library which modifies signal handlers in a way which bypasses the %SIG mechanism.

Without code that reproduces the issue I don't think there's much we can do - we don't even know whether its a bug in perl.

-- The Enterprise is captured by a vastly superior alien intelligence which does not put them on trial. -- Things That Never Happen in "Star Trek" #10

gflohr commented 4 years ago

It sounds like somehow there is a discrepancy between what the OS and perl thinks about the state of the signal handdler.

The bug is not OS dependant. It is reproducible under Mac OS X and under Linux.

Without code that reproduces the issue I don't think there's much we can do - we don't even know whether its a bug in perl.

I can provide a tarball if anybody is willing to debug the issue. My customer and me have found the workaround described above, so it is not urgent. But I think it is quite likely that it is a bug in perl given the fact that it happens with completely different OSs.

iabyn commented 4 years ago

On Sun, Mar 22, 2020 at 08:23:04AM -0700, Guido Flohr wrote:

It sounds like somehow there is a discrepancy between what the OS and perl thinks about the state of the signal handdler.

The bug is not OS dependant. It is reproducible under Mac OS X and under Linux.

I didn't say that it was a bug in the OS. I meant that in normal operation of a perl interpreter, the interpreter ensures that whenever signal handling is updated, perl updates two things in sync:

1) it tells the OS to call, for a particular signal, a signal handler function within the perl interpreter; 2) it updates interl interpreter data, so that when the OS calls that (C-level) function, perl knows how to handle it - e.g. to call out to a perl-level function.

What is happening in your case is that the two are getting out of sync. The OS is calling perl's C-level signal handler function for that signal, but that function then doesn't know how to handle it, so outputs the error message you see instead.

As I said before, the code within the interpreter called whenever %SIG is modified always updates both aspects, so my suspicion is that something outside the normal interpreter such as XS code or a C library is in some fashion updating one but not the other. Which doesn't of course rule out it being a perl bug.

I can provide a tarball if anybody is willing to debug the issue. My customer and me have found the workaround described above, so it is not urgent. But I think it is quite likely that it is a bug in perl given the fact that it happens with completely different OSs.

If the tarball contains a standalone reliable reproducer, we'd be more interested.

-- You're only as old as you look.

gflohr commented 4 years ago

I didn't say that it was a bug in the OS.

I see. Then I misunderstood it.

If the tarball contains a standalone reliable reproducer, we'd be more interested.

Yes, I can prepare that. I am just not allowed to make it public. But you can always contact me via my email address on my github page @gflohr.

tonycoz commented 4 years ago

I could see this happening under threads, since IIRC %SIG isn't synchronized between threads, if one thread sets a handler for CHLD and the OS sends the signal to a different thread, this error could occur.

Does your code use threads at all?

gflohr commented 4 years ago

Does your code use threads at all?

No, and the interpreters are all compiled without interpreter threads support. I had put a "use threads" into the main script and it dies right away with "This Perl not built to support threads ...", so I can also rule out that I am by accident using a different interpreter.

tonycoz commented 4 years ago

My workaround is to explicitely assign "DEFAULT" to $SIG{CHLD} resp. $SIG{HUP}, instead of the old value undef but that is not a satisfying solution. That is very similar to the workaround that was recommended for an almost identical problem in GNU parallel several years ago: https://lists.gnu.org/archive/html/bug-parallel/2016-10/msg00000.html

It's unclear to me - are you setting/resetting $SIG{CHLD} continuously over the life of the program, or setting it once?

Net::Server appears to only set it on startup.

gflohr commented 4 years ago

The original program itself did not touch $SIG{CHILD} at all. Now, this workaround has been added:

    $SIG{CHLD} ||= 'DEFAULT';
    $SIG{HUP} ||= 'DEFAULT';

This is now executed before any fork(). At this point, $SIG{CHILD} was undef, when using the Net::Server::Single personality. Setting it to "DEFAULT" instead should be a noop but as a matter of fact, it makes the error vanish.

When using Net::Server::PreFork, the children died with the same error message but about SIGHUP. The "workaround" cures this behavior as well.

tonycoz commented 4 years ago

Which other libraries are being used by the process? For example database drivers.

gflohr commented 4 years ago

DBD::SQLite is in use. Otherwise nothing suspicious.

tonycoz commented 4 years ago

I can see a race if a signal is (safe signal) received and marked pending, but the handler is removed before it can be delivered, but I don't see another possibility right now.

Without a reproducible example I don't see a way to debug this.