cpan-authors / IPC-Run

https://metacpan.org/pod/IPC::Run
Other
21 stars 38 forks source link

NetBSD 10.0 fails test 'Did not need kill_kill' #175

Open nmisch opened 3 months ago

nmisch commented 3 months ago

Example pass w/ NetBSD 9.3: https://www.cpantesters.org/report/5dd40274-f12f-11ee-be45-8b0dec80b09a Example fail w/ NetBSD 10.0: https://www.cpantesters.org/report/efbcc1aa-f5f1-11ee-a642-ee3ed50263fb

I've reproduced this via GitHub Actions (uses bsd workflow fixes that I need to polish for inclusion): https://github.com/nmisch/IPC-Run/actions/runs/8873497758/job/24359422461

I plan to investigate a fix like this:

-            'sleep while 1',
+            '$SIG{TERM}="DEFAULT";$|=1;print "running\n";sleep while 1',
nmisch commented 3 months ago

That did not succeed. The SIGTERM is sometimes lost if it arrives between the start and end of the child's execve(). kdump excerpt:

2024-05-19T23:31:23.1388650Z   7858   7858 perl     CALL  execve(0x7315a6eec320,0x7315a6eec340,0x7315a8aae000)
2024-05-19T23:31:23.1388795Z   7858   7858 perl     NAMI  "/usr/pkg/bin/perl"
2024-05-19T23:31:23.1388944Z   7858   7858 perl     NAMI  "/usr/libexec/ld.elf_so"
2024-05-19T23:31:23.1389080Z   8000   8000 perl     GIO   fd 4 read 0 bytes
...
2024-05-19T23:31:23.1462538Z   8000   8000 perl     CALL  kill(0x1eb2, SIGTERM)
2024-05-19T23:31:23.1462650Z   8000   8000 perl     RET   kill 0
2024-05-19T23:31:23.1462760Z   7858   7858 perl     EMUL  "netbsd"
2024-05-19T23:31:23.1462893Z   7858   7858 perl     RET   execve JUSTRETURN

The fd 4 read 0 comes from the following code in the parent (pid 8000) observing completion of child (pid 7858) FD_CLOEXEC processing:

    ## Wait for kid to get to its exec() and see if it fails.
    _close $self->{SYNC_WRITER_FD};
    my $sync_pulse = _read $sync_reader_fd;
    _close $sync_reader_fd;

Testing in C code, I found NetBSD's rules for pending signals at execve() are different than other kernels I tested. I've reported this at https://gnats.netbsd.org/58268. I can modify the test case to accept both behaviors, since the point of the test isn't to check kernel behavior.

IPC::Run could partially hide the kernel-specific behavior by retrying SIGTERM. I'm not inclined to do that, since there's no guarantee that an arbitrary child process will treat two copies of SIGTERM the same way as one copy. Other opinions?