cpan-authors / IPC-Run

https://metacpan.org/pod/IPC::Run
Other
21 stars 38 forks source link

binmode is a necessity for win32 usability and must be default #116

Open wchristian opened 6 years ago

wchristian commented 6 years ago

Just leaving this as a note, since i recently ran into it.

Was trying to use a library that calls an external program to generate PDF and another author had decided to use IPC::Run inside the library. Since PDFs are binary data by default, the result was of course mojibake, and due to the entirety of the situation there was no way to resolve this other than to replace IPC::Run entirely.

I cannot offhand think of a situation where enabling binmode by default would cause any serious breakage that couldn't be managed (e.g. tests that don't expect this behavior).

mohawk2 commented 6 years ago

I tend to agree. @toddr ?

toddr commented 6 years ago

Actually we have another ticket asking us to have an option to default to utf-8

toddr commented 6 years ago

It sounds like we need a global for this or if the run() interface supports it, maybe it could be an option passed in.

wchristian commented 6 years ago

Defaulting to UTF8 might be acceptable too, as it disables newline mangling as well. I'd still highly recommend you actually try and see what happens if you make utf8 default in various environments when dealing with 3rd party binary producers. I expect some breakage, for example when dealing with the pure ascii windows cmd shell.

Keep in mind that making binmode doesn't add any data transformation, it disables a dangerous one. Making UTF8 default would indeed add a transformation.

wchristian commented 6 years ago

Btw, which ticket are you talking about?

toddr commented 6 years ago

@wchristian: Pumping UTF-8 data mangles it [rt.cpan.org #58853] #87

toddr commented 6 years ago

87 has an example script. Would the change you're proposing here for binmode also fix it? If someone can test that then I'd be all for just setting the default. Ideally though we should probably still try to provide a way to somehow alter the mode at run time?

wchristian commented 6 years ago

Reading the other one now.

As for the second question:

Yes, binary mode should be the default, but allowing customization is also key, due to it being hard to configure the called code in most cases.

wchristian commented 6 years ago

Not sure if the binmode thing would fix anything for UTF, but i think it might help improve the situation. I've also rewritten the test script there a little to show the newline issue:

#!/usr/bin/perl
use strict;
use warnings;
use B 'perlstring';

my $report_length  = 0;                                      # set this to 1 to switch string dumps to string length
my $mode           = ":raw";
my @lines          = ( "ab", "\n", "\r", "\r\n", "\n\r" );
my $display_length = 8;                                      # set this to something reasonable

sub fm { sprintf "%-${display_length}s", $report_length ? length $_[0] : perlstring $_[0] }

if ( defined $ENV{IPC_SUB_PROCESS} ) {
    binmode STDIN,  $mode;
    binmode STDERR, $mode;
    binmode STDOUT, $mode;
    my $in = do { local $/; <STDIN> };
    my $out = $lines[ $ENV{IPC_SUB_INDEX} ];
    print STDERR fm( $in ) . " | " . fm( $out ) . " out became: ";
    print $out;
    exit;
}

$ENV{IPC_SUB_PROCESS} = 1;
use IPC::Run ();
for my $i ( 0 .. $#lines ) {
    print fm( $lines[$i] ) . " in became: ";
    $ENV{IPC_SUB_INDEX} = $i;
    IPC::Run::run( [ "perl", __FILE__ ], "<", \$lines[$i], ">", \my $out );
    print fm( $out ) . "\n";
}

__END__

D:\>perl ipc_run.pl
"ab"     in became: "ab"     | "ab"     out became: "ab"
"\n"     in became: "\r\n"   | "\n"     out became: "\n"
"\r"     in became: "\r"     | "\r"     out became: "\r"
"\r\n"   in became: "\r\n"   | "\r\n"   out became: "\n"
"\n\r"   in became: "\r\n\r" | "\n\r"   out became: "\n\r"
toddr commented 6 years ago

Debian report https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=450663

wchristian commented 6 years ago

Oh god, yeah, implicit charset munging will get different results depending on locale. Even more power to the argument of making binmode default.

toddr commented 6 years ago

Ok. sounds like we need a patch. I'm overloaded this week so won't be able to look at it myself.

mohawk2 commented 6 years ago

@wchristian Do you feel like constructing a little bit of .t that will capture the charset munging? I can sellotape that plus your script above into a test and then have a go at the code to fix it.

wchristian commented 6 years ago

Sent tests in #120.

toddr commented 6 years ago

Tests are in and presumably failing on windows. we need to get the windows smoker working.