Closed klaus03 closed 8 years ago
I'm boggled by the underlying issue. Possibly @leont has some insight.
About the patch itself, I'm concerned that it's overly specific to one particular pattern of layers. For example, what if someone reverses the ":crlf" and ":encoding(utf8)" layers? Or if someone wants to use an actually secure UTF-8 layer like ":encoding(UTF-8)" or ":utf8_strict" (from PerlIO::utf8_strict?
I'd rather have something that actually deals with the state of the layers and can be smart about whether to filter out the "unix" layer or not.
Unfortunately, Capture::Tiny doesn't correctly restore that binmode(STDOUT, ':unix:encoding(utf8):crlf');
It would be helpful if you told us what happens instead. And on what versions of perl you're observing this.
Workaround: Inject a binmode(STDOUT, ':unix:encoding(utf8):crlf') into any perl program.
That sounds like an awful solution for a number of reasons; for starters you have 5 layers and are only using the top three. And it's not quite obvious to me why this would help.
I'm boggled by the underlying issue. Possibly @leont has some insight.
I'm boggled by the proposed solution!
About the patch itself, I'm concerned that it's overly specific to one particular pattern of layers.
Agreed.
Thanks for your replies.
I agree that my proposed patch is overly specific.
I have a strange phenomenon in my perl programs under Windows 7 (x64) using chcp 65001 where characters are mysteriously duplicated. The reason why I am so overly specific is that I have no idea why this happens, the only thing I can say is that the problem seems to go away when I use a specific binmode (":unix:encoding(utf8):crlf") on STDOUT. I apologise for this.
This is my Perl -v: This is perl 5, version 22, subversion 1 (v5.22.1) built for MSWin32-x64-multi-thread
I already discussed this question on stackoverflow: http://stackoverflow.com/questions/25585248/windows-utf-8-printed-with-chcp-65001-characters-are-mysteriously-duplicated
And I had one answer: * answered Aug 30 '14 at 19:05 by Borodin * As I suspected, this has been reported as a failure in Windows software: * This is caused by a bug in Windows. When writing to a console set to * code page 65001, WriteFile() returns the number of characters written * instead of the number of bytes. * I wasn't aware of a work-around, but if the :unix:encoding(utf8):crlf \ PerlIO stack works for you then it seems you have found one.
It seems that :unix:encoding(utf8):crlf works, but I have no idea * why * it works
I have created a gist which demonstrates the interaction between Capture::Tiny and the impact on the restored layers. https://gist.github.com/klaus03/e1910904104552765e6b
The problem (...characters are mysteriously duplicated...) shows up at markers [02] and [06] where the last two characters (...W'...) are repeated on the next line.
[02] teststr = 'IIIiUVW' W' [06] teststr = 'IIIiUVW' W'
By the way, the problem goes completely away as soon as I write to files
I hope my explanations are clear and I am always greatful for alternative solutions.
@Leont, ping. Any ideas?
@klaus03 please try the better relayering branch. It tries harder to preserve exactly what existed before (even if wacky).
I'm going to close this PR and open a new one for that branch.
I found, what I think, is a problem in Capture::Tiny under Win32. Please consider kindly my patch.
There is one special case under Windows where the sub _relayer() would require improvement.
Basically, I am using binmode(STDOUT, ':unix:encoding(utf8):crlf'); for all my Windows Perl programs. Unfortunately, Capture::Tiny doesn't correctly restore that binmode(STDOUT, ':unix:encoding(utf8):crlf');
This patch resolves this particular problem and correctly restores binmode(STDOUT, ':unix:encoding(utf8):crlf'); for all Windows programs.
Here is the background story:
There is a longstanding bug in Windows, this Windows bug shows up as the last octet repeated when Perl outputs a UTF-8 encoded string in cmd.exe, chcp 65001.
Two StackOverflow articles with basically the same problem: http://stackoverflow.com/questions/23416075 and http://stackoverflow.com/questions/25585248
When writing to a console set to code page 65001, WriteFile() returns the number of characters written instead of the number of bytes.
Workaround: Inject a binmode(STDOUT, ':unix:encoding(utf8):crlf') into any perl program.