Open p5pRT opened 10 years ago
Using strawberryperl portable under a simplified Chinese env.(CP936) Found perl can't read pinyin chars properly from a terminal.
Example:
perl -ne "print" nǐtàiyánsù n t iy ns
Chinese characters are OK. Reading from a file using redirection is also OK. Only terminal plus pinyin will get wrong.
Can anyone familiar with CP936 reproduce this?
The RT System itself - Status changed from 'new' to 'open'
I'm trying to understand this report. I am not familiar with CP936\, but I looked it up\, and it is a one and two byte encoding. Perl supports internally only single byte encodings\, plus\, starting in 5.20\, UTF-8. So this encoding shouldn't be expected to work in Perl. What one is supposed to do is to use the Encode module to translate the encoding into Perl's internal form on input\, and transform back on output. An example I found is http://www.perlmonks.org/?node_id=537416
I'll see what I can find out tonight. Can you please provide the output of the following in the meantime?
chcp & perl -MWin32 -MWin32::Console -E"say for Win32::GetACP()\, Win32::GetOEMCP()\, Win32::Console->new(STD_INPUT_HANDLE)->InputCP()\, Win32::Console->new(STD_OUTPUT_HANDLE)->OutputCP();"
I haven't found anything that helps you. Still waiting on your feedback. Would also like to see the output of perl -ne"printf qq{%v02X\n}\, $_" for that same input.
活动代码页: 936 936 936 936 936
活动代码页: 936 936 936 936 936
On Sun Mar 16 00:41:07 2014\, ntysdd@gmail.com wrote:
Using strawberryperl portable under a simplified Chinese env.(CP936) Found perl can't read pinyin chars properly from a terminal.
Example:
perl -ne "print" nǐtàiyánsù n t iy ns
Chinese characters are OK. Reading from a file using redirection is also OK. Only terminal plus pinyin will get wrong.
I wonder if this is related to #13794
Tony
On Mon\, Jul 7\, 2014 at 5:14 AM\, Tony Cook via RT \perlbug\-followup@​perl\.org wrote:
On Sun Mar 16 00:41:07 2014\, ntysdd@gmail.com wrote:
Using strawberryperl portable under a simplified Chinese env.(CP936) Found perl can't read pinyin chars properly from a terminal.
Example:
perl -ne "print" nǐtàiyánsù n t iy ns
Chinese characters are OK. Reading from a file using redirection is also OK. Only terminal plus pinyin will get wrong.
I wonder if this is related to https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121783
No. The non-ASCII chars are filtered out on or before input. It's not an output issue.
The program is getting a NUL where the non-ASCII chars as suppose to be (6E.00.74.00.69.79.00.6E.73.00.0A). I have no idea why.
On 07/07/2014 09:25 AM\, Eric Brine wrote:
On Mon\, Jul 7\, 2014 at 5:14 AM\, Tony Cook via RT \<perlbug-followup@perl.org \mailto​:perlbug\-followup@​perl\.org> wrote:
On Sun Mar 16 00​:41​:07 2014\, ntysdd@​gmail\.com \<mailto​:ntysdd@​gmail\.com> wrote​: > Using strawberryperl portable under a simplified Chinese env\.\(CP936\) > Found perl can't read pinyin chars properly from a terminal\. > > Example​: > > perl \-ne "print" > > nǐtàiyánsù > n t iy ns > > Chinese characters are OK\. > Reading from a file using redirection is also OK\. > Only terminal plus pinyin will get wrong\. I wonder if this is related to https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121783
No. The non-ASCII chars are filtered out on or before input. It's not an output issue.
The program is getting a NUL where the non-ASCII chars as suppose to be (6E.00.74.00.69.79.00.6E.73.00.0A). I have no idea why.
I'm still having trouble grokking this issue. According to http://msdn.microsoft.com/en-US/goglobal/cc305153 CP936 is ASCII plus 0x80 means the EURO SIGN. 0xFF is undefined\, and 0x81 - 0xFE start a two byte sequence that give various ideographs.
I don't understand what it might mean to input an accented Latin character when it appears to me that the terminal is not set up to understand them.
On Tue\, Jul 8\, 2014 at 2:57 PM\, Karl Williamson \public@​khwilliamson\.com wrote:
I'm still having trouble grokking this issue.
If I enter "nitàiyánsù" into my cp850 terminal\, I expect to get the cp850 encoding of those characters from STDIN\, and I do.
perl -MEncode -Mcharnames=:full -nlE"say sprintf '%v02X'\, $_; say charnames::viacode(ord) for split //\, decode('cp850'\, $_);" nitàiyánsù 6E.69.74.85.69.79.A0.6E.73.97 LATIN SMALL LETTER N LATIN SMALL LETTER I LATIN SMALL LETTER T LATIN SMALL LETTER A WITH GRAVE LATIN SMALL LETTER I LATIN SMALL LETTER Y LATIN SMALL LETTER A WITH ACUTE LATIN SMALL LETTER N LATIN SMALL LETTER S LATIN SMALL LETTER U WITH GRAVE ^Z
He enters "nǐtàiyánsù" into his cp936 terminal. He expects to get the cp936 encoding of those characters from STDIN. He doesn't.
6E.A8.AB.74.A8.A4.69.79.A8.A2.6E.73.A8.B4 is what he expects to get 6E.00. 74.00. 69.79.00. 6E.73.00 is what he gets
On 07/08/2014 02:26 PM\, Eric Brine wrote:
On Tue\, Jul 8\, 2014 at 2:57 PM\, Karl Williamson \<public@khwilliamson.com \mailto​:public@​khwilliamson\.com> wrote:
I'm still having trouble grokking this issue\.
If I enter "nitàiyánsù" into my cp850 terminal\, I expect to get the cp850 encoding of those characters from STDIN\, and I do.
perl -MEncode -Mcharnames=:full -nlE"say sprintf '%v02X'\, $_; say charnames::viacode(ord) for split //\, decode('cp850'\, $_);" nitàiyánsù 6E.69.74.85.69.79.A0.6E.73.97 LATIN SMALL LETTER N LATIN SMALL LETTER I LATIN SMALL LETTER T LATIN SMALL LETTER A WITH GRAVE LATIN SMALL LETTER I LATIN SMALL LETTER Y LATIN SMALL LETTER A WITH ACUTE LATIN SMALL LETTER N LATIN SMALL LETTER S LATIN SMALL LETTER U WITH GRAVE ^Z
He enters "nǐtàiyánsù" into his cp936 terminal. He expects to get the cp936 encoding of those characters from STDIN. He doesn't.
6E.A8.AB.74.A8.A4.69.79.A8.A2.6E.73.A8.B4 is what he expects to get 6E.00. 74.00. 69.79.00. 6E.73.00 is what he gets
What I'm saying is there is no encoding in cp936 for those characters.
On Tue\, Jul 8\, 2014 at 4:48 PM\, Karl Williamson \public@​khwilliamson\.com wrote:
On 07/08/2014 02:26 PM\, Eric Brine wrote:
On Tue\, Jul 8\, 2014 at 2:57 PM\, Karl Williamson \<public@khwilliamson.com \mailto​:public@​khwilliamson\.com> wrote:
I'm still having trouble grokking this issue\.
If I enter "nitàiyánsù" into my cp850 terminal\, I expect to get the cp850 encoding of those characters from STDIN\, and I do.
perl -MEncode -Mcharnames=:full -nlE"say sprintf '%v02X'\, $_; say charnames::viacode(ord) for split //\, decode('cp850'\, $_);" nitàiyánsù 6E.69.74.85.69.79.A0.6E.73.97 LATIN SMALL LETTER N LATIN SMALL LETTER I LATIN SMALL LETTER T LATIN SMALL LETTER A WITH GRAVE LATIN SMALL LETTER I LATIN SMALL LETTER Y LATIN SMALL LETTER A WITH ACUTE LATIN SMALL LETTER N LATIN SMALL LETTER S LATIN SMALL LETTER U WITH GRAVE ^Z
He enters "nǐtàiyánsù" into his cp936 terminal. He expects to get the cp936 encoding of those characters from STDIN. He doesn't.
6E.A8.AB.74.A8.A4.69.79.A8.A2.6E.73.A8.B4 is what he expects to get 6E.00. 74.00. 69.79.00. 6E.73.00 is what he gets
What I'm saying is there is no encoding in cp936 for those characters.
$ perl -MEncode -E'use utf8; $_="nǐtàiyánsù"; say sprintf "%v02X"\, encode "cp936"\, $_;' 6E.A8.AB.74.A8.A4.69.79.A8.A2.6E.73.A8.B4
Encode seems to think so?
On Tue\, Jul 8\, 2014 at 5:58 PM\, Eric Brine \ikegami@​adaelis\.com wrote:
On Tue\, Jul 8\, 2014 at 4:48 PM\, Karl Williamson \public@​khwilliamson\.com wrote:
On 07/08/2014 02:26 PM\, Eric Brine wrote:
On Tue\, Jul 8\, 2014 at 2:57 PM\, Karl Williamson \<public@khwilliamson.com \mailto​:public@​khwilliamson\.com> wrote:
I'm still having trouble grokking this issue\.
If I enter "nitàiyánsù" into my cp850 terminal\, I expect to get the cp850 encoding of those characters from STDIN\, and I do.
perl -MEncode -Mcharnames=:full -nlE"say sprintf '%v02X'\, $_; say charnames::viacode(ord) for split //\, decode('cp850'\, $_);" nitàiyánsù 6E.69.74.85.69.79.A0.6E.73.97 LATIN SMALL LETTER N LATIN SMALL LETTER I LATIN SMALL LETTER T LATIN SMALL LETTER A WITH GRAVE LATIN SMALL LETTER I LATIN SMALL LETTER Y LATIN SMALL LETTER A WITH ACUTE LATIN SMALL LETTER N LATIN SMALL LETTER S LATIN SMALL LETTER U WITH GRAVE ^Z
He enters "nǐtàiyánsù" into his cp936 terminal. He expects to get the cp936 encoding of those characters from STDIN. He doesn't.
6E.A8.AB.74.A8.A4.69.79.A8.A2.6E.73.A8.B4 is what he expects to get 6E.00. 74.00. 69.79.00. 6E.73.00 is what he gets
What I'm saying is there is no encoding in cp936 for those characters.
$ perl -MEncode -E'use utf8; $_="nǐtàiyánsù"; say sprintf "%v02X"\, encode "cp936"\, $_;' 6E.A8.AB.74.A8.A4.69.79.A8.A2.6E.73.A8.B4
Encode seems to think so?
And so does the page you linked earlier. Lead byte A8: http://msdn.microsoft.com/en-US/goglobal/gg675289
The encoding should be pretty irrelevant for the test program given. If this were Unix I'd ask to compare perl's behaviour against cat for the same input\, using strace to see what the programs actually get. But being Windows\, that kind of debugging isn't available. I think the weird behaviour seen must be specific to Windows; it doesn't look like Perl behaviour at all.
-zefram
From @tonycoz I wonder if this is related to #13794
Tony
Which was just closed.
From @tonycoz I wonder if this is related to #13794 Tony
Which was just closed.
As previously stated, it's not related to #13794.
13794 was fixed in Win10.
This problem still happens.
C:\Users\ikegami>chcp 936
Active code page: 936
C:\Users\ikegami>echo nǐtàiyánsù
nǐtàiyánsù
C:\Users\ikegami>echo nǐtàiyánsù | perl -ne"print"
nǐtàiyánsù
C:\Users\ikegami>perl -ne"print"
nǐtàiyánsù <- pasted in
n t iy ns
^Z
C:\Users\ikegami>echo nǐtàiyánsù | perl -ne"printf qq{%v02X\n}, $_"
6E.C7.90.74.C3.A0.69.79.C3.A1.6E.73.C3.B9.20.0A
C:\Users\ikegami>perl -ne"printf qq{%v02X\n}, $_"
nǐtàiyánsù
6E.00.74.00.69.79.00.6E.73.00.0A
^Z
Thanks for this example.
What happens if in your paste example, you instead set a $scalar to it, and Devel::Peek Dump that scalar?
Thanks for this example.
What happens if in your paste example, you instead set a $scalar to it, and Devel::Peek Dump that scalar?
As you would expect based on the printf %vX
:
C:\Users\ikegami>perl -MDevel::Peek -wne"Dump($_)"
nǐtàiyánsù <-- pasted in
SV = PV(0x114b8d8) at 0x27bbab0
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x27b55a8 "n\0t\0iy\0ns\0\n"\0
CUR = 11
LEN = 81
^Z
Migrated from rt.perl.org#121450 (status was 'open')
Searchable as RT121450$