IPC::Run::full_results() Windows behavior does not match non-Windows

nmisch commented 2 years ago

This affects the other *result* methods, too. t/result.t fails like this on Windows:

#   Failed test 'Results of all processes'
#   at t/result.t line 16.
#     Structures begin differing at:
#          $got->[2] = '0'
#     $expected->[2] = '42'

#   Failed test 'First non-zero result'
#   at t/result.t line 17.
#          got: ''
#     expected: '42'

#   Failed test 'Result of the third process'
#   at t/result.t line 20.
#          got: '0'
#     expected: '42'

#   Failed test 'Full results of all processes'
#   at t/result.t line 22.
#     Structures begin differing at:
#          $got->[2] = '42'
#     $expected->[2] = '10752'

#   Failed test 'Full result of the third process'
#   at t/result.t line 26.
#          got: '42'
#     expected: '10752'
# Looks like you failed 5 tests of 10.
[23:28:27] t/result.t ........................... 
Dubious, test returned 5 (wstat 1280, 0x500)
Failed 5/10 subtests

(We don't see this failure in GitHub Actions, since GITHUB_WINDOWS_TESTING makes the test a TODO. t/result.t, being absent from MANIFEST, is not part of the distribution. Hence, we don't see this failure in cpantesters, either. Users building from git see the failure. I plan to change that.)

The mismatch arises because Win32::Process::GetExitCode() codes aren't bit-shifted the way wait() status values are. The documentation doesn't settle the question of what the user should expect. For example, full_results() documentation says "Returns a list of child exit values as returned by wait". We don't use wait() on Windows. I could document the mismatch, or I could change behavior to compensate for the mismatch:

--- a/lib/IPC/Run.pm
+++ b/lib/IPC/Run.pm
@@ -3466,7 +3466,7 @@ sub reap_nb {
                 $? = $kid->{RESULT} = 0x0F;
             }
             else {
-                $? = $kid->{RESULT} << 8;
+                $? = $kid->{RESULT} = $kid->{RESULT} << 8;
             }
         }
         else {

I'd certainly pick the latter in a greenfield, but it is a compatibility break. https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/test/perl/PostgreSQL/Test/Utils.pm;h=1ca2cc591708be41c79f02de542e7e64feeba9c5;hb=HEAD#l785 is an example of code compensating for the current mismatch; that would need to change. Are there other options to consider? What do you prefer?

That addresses the exit 42 part of t/result.t, but it doesn't address the SIGKILL part. Per https://perldoc.perl.org/perlport#kill, Perl's signal emulation makes the victim call exit(9). I'll just make t/result.t recognize the platform-specific outcome. IPC::Run can't recreate a distinction that Perl signal emulation erased.

toddr commented 2 years ago

I continue to wonder if it's a losing battle to expect perl code to behave exactly the same on a non-POSIX system (windows) vs a POSIX system (almost everything else).

Signals have always been a complete consistency mess, especially if you stretch back to windows XP.

At this point IPC::Run has so many inconsistencies, I'm tempted to suggest that if we can break some compatibility but get to a consistent state in a short period of time, then we declare this and proceed. What do you think?

mohawk2 commented 2 years ago

I think consistency is important enough here to justify some breakage.

nmisch commented 2 years ago

I like accepting a compatibility break here to improve consistency. Given the Windows compatibility break in the last release, the time is relatively good for more breaks. Thanks for the feedback.

cpan-authors / IPC-Run

IPC::Run::full_results() Windows behavior does not match non-Windows #161