Open gflohr opened 4 years ago
This is the code snippet that spawns the child process:
if ($self->{__daemonize}) {
if (fork) {
CORE::exit(0);
}
unless (POSIX::setsid()) {
# make child session leader and detach from terminal
die "Unable to detach process\n";
}
## Close open file descriptors.
POSIX::close($_) foreach open_files;
## Reopen stdin to /dev/null
open(STDIN, "+>/dev/null")
or warn "Cannot redirect standard input to /dev/null: $!\n";
# XXX
# Re-opening STDOUT colides with it being tie-ed during the request.
# open(STDOUT, "+>/dev/null")
# or warn "Cannot redirect standard output to /dev/null: $!\n";
# Re-opening STDERR colides with mod_perl.
# open(STDERR, "+>/dev/null")
# or warn "Cannot redirect standard error to /dev/null: $!\n";
}
exec($self->{__path}, @{$self->{__argv}})
or die "Exec failed: " . $!;
The function open_files()
returns a list of open file descriptors $self->{__path}
is the path to the script to be executed, $self->{__argv}
is the list of arguments.
This is the code snippet that spawns the child process:
if ($self->{__daemonize}) { if (fork) { CORE::exit(0); } unless (POSIX::setsid()) { # make child session leader and detach from terminal die "Unable to detach process\n"; } ## Close open file descriptors. POSIX::close($_) foreach open_files; ## Reopen stdin to /dev/null open(STDIN, "+>/dev/null") or warn "Cannot redirect standard input to /dev/null: $!\n"; # XXX # Re-opening STDOUT colides with it being tie-ed during the request. # open(STDOUT, "+>/dev/null") # or warn "Cannot redirect standard output to /dev/null: $!\n"; # Re-opening STDERR colides with mod_perl. # open(STDERR, "+>/dev/null") # or warn "Cannot redirect standard error to /dev/null: $!\n"; } exec($self->{__path}, @{$self->{__argv}}) or die "Exec failed: " . $!;
The function
open_files()
returns a list of open file descriptors$self->{__path}
is the path to the script to be executed,$self->{__argv}
is the list of arguments.
Is the code snippet from your own code -- or from perl or CPAN code?
The code snippet is not from a CPAN module but from the application (which is not on CPAN).
It sounds like somehow there is a discrepancy between what the OS and perl thinks about the state of the signal handdler. The OS still thinks that perl will be handling the signal (rather than e.g. the OS ignoring it or whatever), but when the OS calls perl's C-level signal handler, perl thinks there is no signal handler present in %SIG so dies with the error you're seeing.
Normally when %SIG is modified, perl itself updates both the O'sS and perl's view of that signal handler. For them to get out of sync, I would suspect that their is some part of your code or a library which modifies signal handlers in a way which bypasses the %SIG mechanism.
Without code that reproduces the issue I don't think there's much we can do - we don't even know whether its a bug in perl.
-- The Enterprise is captured by a vastly superior alien intelligence which does not put them on trial. -- Things That Never Happen in "Star Trek" #10
It sounds like somehow there is a discrepancy between what the OS and perl thinks about the state of the signal handdler.
The bug is not OS dependant. It is reproducible under Mac OS X and under Linux.
Without code that reproduces the issue I don't think there's much we can do - we don't even know whether its a bug in perl.
I can provide a tarball if anybody is willing to debug the issue. My customer and me have found the workaround described above, so it is not urgent. But I think it is quite likely that it is a bug in perl given the fact that it happens with completely different OSs.
On Sun, Mar 22, 2020 at 08:23:04AM -0700, Guido Flohr wrote:
It sounds like somehow there is a discrepancy between what the OS and perl thinks about the state of the signal handdler.
The bug is not OS dependant. It is reproducible under Mac OS X and under Linux.
I didn't say that it was a bug in the OS. I meant that in normal operation of a perl interpreter, the interpreter ensures that whenever signal handling is updated, perl updates two things in sync:
1) it tells the OS to call, for a particular signal, a signal handler function within the perl interpreter; 2) it updates interl interpreter data, so that when the OS calls that (C-level) function, perl knows how to handle it - e.g. to call out to a perl-level function.
What is happening in your case is that the two are getting out of sync. The OS is calling perl's C-level signal handler function for that signal, but that function then doesn't know how to handle it, so outputs the error message you see instead.
As I said before, the code within the interpreter called whenever %SIG is modified always updates both aspects, so my suspicion is that something outside the normal interpreter such as XS code or a C library is in some fashion updating one but not the other. Which doesn't of course rule out it being a perl bug.
I can provide a tarball if anybody is willing to debug the issue. My customer and me have found the workaround described above, so it is not urgent. But I think it is quite likely that it is a bug in perl given the fact that it happens with completely different OSs.
If the tarball contains a standalone reliable reproducer, we'd be more interested.
-- You're only as old as you look.
I didn't say that it was a bug in the OS.
I see. Then I misunderstood it.
If the tarball contains a standalone reliable reproducer, we'd be more interested.
Yes, I can prepare that. I am just not allowed to make it public. But you can always contact me via my email address on my github page @gflohr.
I could see this happening under threads, since IIRC %SIG isn't synchronized between threads, if one thread sets a handler for CHLD and the OS sends the signal to a different thread, this error could occur.
Does your code use threads at all?
Does your code use threads at all?
No, and the interpreters are all compiled without interpreter threads support. I had put a "use threads" into the main script and it dies right away with "This Perl not built to support threads ...", so I can also rule out that I am by accident using a different interpreter.
My workaround is to explicitely assign "DEFAULT" to
$SIG{CHLD}
resp.$SIG{HUP}
, instead of the old valueundef
but that is not a satisfying solution. That is very similar to the workaround that was recommended for an almost identical problem in GNU parallel several years ago: https://lists.gnu.org/archive/html/bug-parallel/2016-10/msg00000.html
It's unclear to me - are you setting/resetting $SIG{CHLD} continuously over the life of the program, or setting it once?
Net::Server appears to only set it on startup.
The original program itself did not touch $SIG{CHILD}
at all. Now, this workaround has been added:
$SIG{CHLD} ||= 'DEFAULT';
$SIG{HUP} ||= 'DEFAULT';
This is now executed before any fork()
. At this point, $SIG{CHILD}
was undef
, when using the Net::Server::Single
personality. Setting it to "DEFAULT" instead should be a noop but as a matter of fact, it makes the error vanish.
When using Net::Server::PreFork
, the children died with the same error message but about SIGHUP
. The "workaround" cures this behavior as well.
Which other libraries are being used by the process? For example database drivers.
DBD::SQLite
is in use. Otherwise nothing suspicious.
I can see a race if a signal is (safe signal) received and marked pending, but the handler is removed before it can be delivered, but I don't see another possibility right now.
Without a reproducible example I don't see a way to debug this.
Module:
Description
In a web server based on
Net::Server
(version 2.007 and the most recent 2.009) with theNet::Server::Single
personality, I get the error "Signal SIGCHLD received, but no signal handler set." and the server terminates. When using theNet::Server::PreFork
personality, the signal is SIGHUP instead of SIGCHLD, and the server continues to run because only one of the forked children had terminated.The error occurs after a handler of the server has forked a background process. In the child,
POSIX::setsid()
is called, all open file descriptors are closed andSTDIN
is redirected to/dev/null
.STDOUT
andSTDERR
are not re-opened because the same code is supposed to work in a mod_perl environment.%SIG
is not modified. The child process (another Perl script) is executed withexec()
.There is a corresponding question on stackoverflow.com: https://stackoverflow.com/questions/60708194/error-message-signal-sigchld-received-but-no-signal-handler-set/60761593#60761593
I have answered it myself with additional findings/information: https://stackoverflow.com/a/60761593/5464233
My workaround is to explicitely assign "DEFAULT" to
$SIG{CHLD}
resp.$SIG{HUP}
, instead of the old valueundef
but that is not a satisfying solution. That is very similar to the workaround that was recommended for an almost identical problem in GNU parallel several years ago: https://lists.gnu.org/archive/html/bug-parallel/2016-10/msg00000.htmlSteps to Reproduce
Not possible. It's a (proprietary) application with 350000+ lines of Perl code. Modifying the code or running the application in the Perl debugger makes the error disappear instantly. If there is interest in trying to debug the problem, I can provide access to a test installation.
I also wasn't able to run the program in
gdb
because the server did not come up.Expected behavior
The script should not be able to make the interpreter terminate prematurely.
Perl configuration
I have tested on Mac OS X with these Perl versions using perlbrew:
All versions starting from 5.18.4 have the problem. The older ones do not have it.
The same behavior was reported for 5.26.x to 5.30.x on Linux.