Closed nicowilliams closed 7 years ago
Thanks!
For the record, the above commit was reverted. SIGCHLD is required: https://github.com/famzah/popen-noshell/commit/64c51f5099af9da3c75f53cd3d4a74e4a2117b49
What is it required by? Your existing consumers? I can understand that, though SIGCHLD
is not reliable in general, it might be for your users' use-cases.
It's required so that you can wait for and get the exit status of the child process. Otherwise I can't re-implement pclose().
Can you do waitpid() for a child process when SIGCHLD is never sent? My (very fast) tests show that in such a case the terminated child process is reaped automatically, and it's too late when we call waitpid() in the parent.
You know that the child exited when either the pipe to its stdin
gets EPIPE
or SIGPIPE
or when you get EOF
from its output pipe. At that point you can call waitpid(2)
. And yes, you absolutely can call waitpid(2)
and waitid(2)
with a specific PID without having received SIGCHLD
.
Oh. But maybe Linux reaps it automatically if you didn't get SIGCHLD
? That isn't documented in the clone(2)
man page. I can believe that it's true, but that would be very disappointing! Normally automatic reaping is only supposed to happen if you ignore SIGCHLD
.
Well, I guess I've wasted your time here :(
I haven't spent much time to research it. While running the unit tests, waitpid()
failed with ECHILD (No child processes).
Here is a sample strace
:
famzah@vbox64:~/popen-noshell$ gcc -Wall popen_noshell.c popen_noshell_tests.c -o popen_noshell_tests && strace -f ./popen_noshell_tests
mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb3b7a03000
clone(strace: Process 4311 attached
<unfinished ...>
[pid 4311] open("/dev/null", O_RDWR) = 5
...
[pid 4311] execve("/", ["/"], [/* 80 vars */]) = -1 EACCES (Permission denied)
# this strange execve() is part of the unit test, so it's expected
...
[pid 4311] exit_group(255) = ?
[pid 4310] <... clone resumed> child_stack=0x7fb3b8203000, flags=CLONE_VM|CLONE_VFORK) = 4311
...
[pid 4310] wait4(4311, 0x7fffeb822d74, 0, NULL) = -1 ECHILD (No child processes)
If I set SIGCHLD
in the clone()
flags, then the strace
looks a bit differently:
...
[pid 4355] exit_group(255) = ?
[pid 4354] <... clone resumed> child_stack=0x7f6cedde3000, flags=CLONE_VM|CLONE_VFORK|SIGCHLD) = 4355
...
[pid 4354] wait4(4355, <unfinished ...>
[pid 4355] +++ exited with 255 +++
<... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 255}], 0, NULL) = 4355
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=4355, si_uid=1000, si_status=255, si_utime=0, si_stime=0} ---
munmap(0x7f6ced5e3000, 8392704) = 0
...
Oh, check this:
1976 else if ((clone_flags & CSIGNAL) != SIGCHLD)
1977 trace = PTRACE_EVENT_CLONE;
It seems you may need to add __WCLONE
to the wait options. Can you try that?
Also, if you make the child sleep for a long time, can you see with ps(1)
or /proc
if it got reaped? I bet it didn't get reaped, it just isn't visible to waitpid(2)
unless you add __WCLONE
to the wait options.
Aha! Just use __WALL
in the waitpid(2)
options. __WCLONE
didn't work in all cases, but __WALL
did:
diff --git a/popen_noshell.c b/popen_noshell.c
index c935cf4..0a7b0ab 100644
--- a/popen_noshell.c
+++ b/popen_noshell.c
@@ -352,7 +352,7 @@ pid_t popen_noshell_vmfork(int (*fn)(void *), void *arg, void **memory_to_free_o
*/
#ifndef POPEN_NOSHELL_VALGRIND_DEBUG
- pid = clone(fn, stack_aligned, CLONE_VM | SIGCHLD | CLONE_VFORK, arg);
+ pid = clone(fn, stack_aligned, CLONE_VM | CLONE_VFORK, arg);
#else
pid = fork(); // Valgrind does not support arbitrary clone() calls, so we use fork for the tests
#endif
@@ -649,7 +649,7 @@ int pclose_noshell(struct popen_noshell_pass_to_pclose *arg) {
return -1;
}
- if (waitpid(arg->pid, &status, 0) != arg->pid) {
+ if (waitpid(arg->pid, &status, __WALL) != arg->pid) {
return -1;
}
$ cc -o t popen_noshell.c popen_noshell_tests.c
$ ./t
Tests passed OK.
$
So, this gets pretty close to Illumos forkx(2)
semantics, as long as all users of __WALL
in the process wait for specific PIDs.
Of course, for an actual patch you'll want to do something like:
#ifdef __WALL
#define sig_request 0
#else
#define __WALL 0
#define sig_request SIGCHLD
#endif
Right. Thanks for the research! I've committed this. Tests pass OK, indeed.
The flag __WALL
is available since Linux 2.4 and therefore there is no need to test for its existence. I assume that everyone has already upgraded their systems at least to Linux 2.6.
Yes, 2.6 is good enough!
For background, Solaris/OpenSolaris(RIP)/Illumos have a
forkx()
andvforkx()
that takes a flags option. One of the flags is to not sendSIGCHLD
to the parent when the child dies. Thepopen()
andsystem()
implementations in those systems do request thatSIGCHLD
not be sent: because if the parent happens to have a handler for that signal, it can reap the child opened bypopen()
(orsystem()
), which then makes it impossible to retrieve its exit status inpclose()
orsystem()
.