Open wchristian opened 6 years ago
I tend to agree. @toddr ?
Actually we have another ticket asking us to have an option to default to utf-8
It sounds like we need a global for this or if the run() interface supports it, maybe it could be an option passed in.
Defaulting to UTF8 might be acceptable too, as it disables newline mangling as well. I'd still highly recommend you actually try and see what happens if you make utf8 default in various environments when dealing with 3rd party binary producers. I expect some breakage, for example when dealing with the pure ascii windows cmd shell.
Keep in mind that making binmode doesn't add any data transformation, it disables a dangerous one. Making UTF8 default would indeed add a transformation.
Btw, which ticket are you talking about?
@wchristian: Pumping UTF-8 data mangles it [rt.cpan.org #58853] #87
Reading the other one now.
As for the second question:
Yes, binary mode should be the default, but allowing customization is also key, due to it being hard to configure the called code in most cases.
Not sure if the binmode thing would fix anything for UTF, but i think it might help improve the situation. I've also rewritten the test script there a little to show the newline issue:
#!/usr/bin/perl
use strict;
use warnings;
use B 'perlstring';
my $report_length = 0; # set this to 1 to switch string dumps to string length
my $mode = ":raw";
my @lines = ( "ab", "\n", "\r", "\r\n", "\n\r" );
my $display_length = 8; # set this to something reasonable
sub fm { sprintf "%-${display_length}s", $report_length ? length $_[0] : perlstring $_[0] }
if ( defined $ENV{IPC_SUB_PROCESS} ) {
binmode STDIN, $mode;
binmode STDERR, $mode;
binmode STDOUT, $mode;
my $in = do { local $/; <STDIN> };
my $out = $lines[ $ENV{IPC_SUB_INDEX} ];
print STDERR fm( $in ) . " | " . fm( $out ) . " out became: ";
print $out;
exit;
}
$ENV{IPC_SUB_PROCESS} = 1;
use IPC::Run ();
for my $i ( 0 .. $#lines ) {
print fm( $lines[$i] ) . " in became: ";
$ENV{IPC_SUB_INDEX} = $i;
IPC::Run::run( [ "perl", __FILE__ ], "<", \$lines[$i], ">", \my $out );
print fm( $out ) . "\n";
}
__END__
D:\>perl ipc_run.pl
"ab" in became: "ab" | "ab" out became: "ab"
"\n" in became: "\r\n" | "\n" out became: "\n"
"\r" in became: "\r" | "\r" out became: "\r"
"\r\n" in became: "\r\n" | "\r\n" out became: "\n"
"\n\r" in became: "\r\n\r" | "\n\r" out became: "\n\r"
Debian report https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=450663
Oh god, yeah, implicit charset munging will get different results depending on locale. Even more power to the argument of making binmode default.
Ok. sounds like we need a patch. I'm overloaded this week so won't be able to look at it myself.
@wchristian Do you feel like constructing a little bit of .t
that will capture the charset munging? I can sellotape that plus your script above into a test and then have a go at the code to fix it.
Sent tests in #120.
Tests are in and presumably failing on windows. we need to get the windows smoker working.
Just leaving this as a note, since i recently ran into it.
Was trying to use a library that calls an external program to generate PDF and another author had decided to use IPC::Run inside the library. Since PDFs are binary data by default, the result was of course mojibake, and due to the entirety of the situation there was no way to resolve this other than to replace IPC::Run entirely.
I cannot offhand think of a situation where enabling binmode by default would cause any serious breakage that couldn't be managed (e.g. tests that don't expect this behavior).