Closed p5pRT closed 20 years ago
Here is a summary of the failures seen during `make test` (apologies for the lousy line wrapping):
op/pack.............FAILED at test 152
op/regexp...........CEE5213S The signal SIGPIPE was received. FAILED at test 483 op/regexp_noamp.....CEE5213S The signal SIGPIPE was received. FAILED at test 483
pragma/locale.......CEE3703I In HPCB Control Block\, the Eye Catcher is damaged. CEE3704I Expected data at 00000001 : HPCB CEE0802C Heap storage control information was damaged. From entry point XS_POSIX_setlocale at compile unit offset +0000025C a. FAILED at test 99 pragma/overload.....CEE5213S The signal SIGPIPE was received. FAILED at test 178
pragma/utf8.........Malformed UTF-8 character at pragma/utf8.t line 22. Malformed UTF-8 character at pragma/utf8.t line 22. Malformed UTF-8 character at pragma/utf8.t line 22. Malformed UTF-8 character at pragma/utf8.t line 22. Malformed UTF-8 character at pragma/utf8.t line 22. Malformed UTF-8 character at pragma/utf8.t line 22. Malformed UTF-8 character at pragma/utf8.t line 22. Malformed UTF-8 character at pragma/utf8.t line 22. Malformed UTF-8 character at pragma/utf8.t line 22. Malformed UTF-8 character at pragma/utf8.t line 22. /([\x{80}-\x{10ffff}])/: unmatched [] in regexp at pragma/utf8.t line 22. FAILED at test 0 pragma/warnings.....PROG: # doop.c use warnings 'utf8' ; use utf8 ; $_ = "\x80 \xff" ; chop ; no warnings 'utf8' ; $_ = "\x80 \xff" ; chop ; EXPECTED: \x80 will produce malformed UTF-8 character; use \x{80} for that at - line 4. \xff will produce malformed UTF-8 character; use \x{ff} for that at - line 4. Malformed UTF-8 character at - line 5. GOT: Malformed UTF-8 character at - line 4. Malformed UTF-8 character at - line 4. Malformed UTF-8 character at - line 4. Malformed UTF-8 character at - line 4. Malformed UTF-8 character at - line 5. CEE5213S The signal SIGPIPE was received. FAILED at test 14
lib/bigfloat........CEE5213S The signal SIGPIPE was received. FAILED at test 38 lib/bigfltpm........CEE5213S The signal SIGPIPE was received. FAILED at test 354
lib/charnames.......CEE5213S The signal SIGPIPE was received. FAILED at test 1
lib/dumper..........CEE5213S The signal SIGPIPE was received. FAILED at test 43
lib/io_unix.........Can't call method "getline" on an undefined value at lib/io. FAILED at test 3
Failed 12 test scripts out of 217\, 94.47% okay. u=8.05 s=2.68 cu=148.98 cs=49.66 scripts=217 tests=10526
CEE3703I In HPCB Control Block\, the Eye Catcher is damaged.
The "Eye Catcher is damaged"? Who said mainframe error messages are boring?
-- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Jarkko Hietaniemi wrote:
CEE3703I In HPCB Control Block\, the Eye Catcher is damaged.
The "Eye Catcher is damaged"? Who said mainframe error messages are boring?
Indeed.
BTW I note that ord("!") == 90 here so I can get around the one failure from t/lib/charnames.t if I were to modify lib/unicode/Name.pl like so:
End of Diff.
but obviously I cannot impose such changes on the ascii world so what strategy should be followed here? Should there be a lib/unicode/Name_ebcdic.pl file that almost replicates lib/unicode/Name.pl? I would not think so since it would have to accomodate the differences between IBM-1047\, 819 and POSIX-BC ebcdic sets and it would mean a great deal of replicated information. Suggestions welcome :-)
Peter Prymmer
Peter Prymmer writes:
Jarkko Hietaniemi wrote:
CEE3703I In HPCB Control Block\, the Eye Catcher is damaged.
The "Eye Catcher is damaged"? Who said mainframe error messages are boring?
Indeed.
BTW I note that ord("!") == 90 here so I can get around the one failure from t/lib/charnames.t if I were to modify lib/unicode/Name.pl like so:
--- lib/unicode/Name.pl.orig Tue Nov 2 16:36:49 1999 +++ lib/unicode/Name.pl Tue Nov 2 16:37:36 1999 @@ -1\,7 +1\,7 @@ return \<\<'END'; 0000 001f \
0020 SPACE -0021 EXCLAMATION MARK +005a EXCLAMATION MARK 0022 QUOTATION MARK 0023 NUMBER SIGN 0024 DOLLAR SIGN End of Diff. but obviously I cannot impose such changes on the ascii world so what strategy should be followed here? Should there be a
You cannot impose such changes on the *Unicode* world\, HTH.
lib/unicode/Name_ebcdic.pl file that almost replicates lib/unicode/Name.pl? I would not think so since it would have to accomodate the differences between IBM-1047\, 819 and POSIX-BC ebcdic sets and it would mean a great deal of replicated information. Suggestions welcome :-)
There is a paper at the Unicode web page about EBCDIC. Somebody who cares deeply enough both about Unicode and EBCDIC Should Probably Do Something (TM). (BTW and OTH\, before Nick's new UTF-8 scheme is implemented\, not much work based on the current scheme should be conducted.)
-- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
The enclosed patch ought to take care of the following failures from my previous report:
op/pack.............FAILED at test 152 op/regexp...........FAILED at test 483 op/regexp_noamp.....FAILED at test 483 pragma/locale.......FAILED at test 99 pragma/overload.....FAILED at test 178 lib/dumper..........FAILED at test 43
Most of these fixes are simple encoded character set issues. For example\, most of the tweaks for t/lib/dumper.t were already introduced to 5.005_03 (mostly hashing order differences).
The failures in t/pragma/locale.t clearly indicate two broken locales on my system: for Svenska and Thai. I do not know how widespread that brokenness is across other (ebcdic?) platforms hence I set my additions off with C\<if ($^O eq 'os390')> there. Emboldened by the '# HP' comments in the encoding sub I also added a few character set encodings that include common ebcdic sets also set off with C\<$^O eq 'os390'>. I could have added something like a total of
% ls -1 /usr/lib/nls/locale | wc -l 152
locales\, including 44 unique encodings such as "IBM-1149@euro"\, but I figured that it may be safer to go with a smaller number of tests for now. At any rate\, I suspect that the only potential for controversy would be in my proposed changes to t/pragma/locale.t. If any locale experts want to take issue with it please let me know.
This patch does not address the following failures that were previously reported:
pragma/utf8.........FAILED at test 0
I suspect there are some lingering asciiisms in toke.c perhaps sv.c that pertain to the utf8 implementation. I've not yet had tuits to look into this.
lib/bigfloat........FAILED at test 38 lib/bigfltpm........FAILED at test 354
These two look nasty since they look like perl is having trouble recognizing floats under certain circumstances (ugh).
lib/charnames.......FAILED at test 1
There are at least two ways to fix this one on ebcdic platforms.
I sent email to Jarkko as well as to p5p and perl-mvs that contained
both patches on Monday. Unfortunately that was the day that timtowdi's
disk decided to fail. I can re-post it if folks would like to see it.
In any event only one (not both) of the patches ought to be applied.
I favor the second\, longer\, one.
lib/io_unix.........FAILED at test 3
This one appears to result from C\<$sock->accept();> returning an undef value near line 65 of t/lib/io_unix.t. Since that appears to be a wrapper around perl's builtin C\<accept()> I am still stumped. This might indicate a system tcp/ip stack config issue(???).
Alright here is the unified diff (which must be applied with a -u capable patch program and that precludes OE's /bin/diff). It has been tested to cause no problems to `make test` on a T64 Unix build done with `sh Configure -des`. On OS/390 my current "97.24% okay" result does not even include the fix to lib/charnames.t:
begin 644 62.patch.gz M'XL(")'F*3@``S8N\<`#E6?]WVD82_]G\%7.@!+`@Z`L2(#\<N;IJ\I*]I[F+? MI7T6\1-HL77&$D$B\<4+IWWXSNY*00&!(FQ_NCN>GE32S\,Z.9S\SL>EUO/(;F M;`Y3-IL83Q3%N#*U)\'\,NVY%K8DW;+GS.R0]B7(\,:[12L]E\4\,#1^=R'\<S8% M\,$!5+?S3-5![O5Y)EN5=TH_>\,1=^"3Z"JH"J64;;4CIB8K\/S5[#!+G7Z$*_ M7X)E"4HP#QG\Z$2.9?W()9R49'KU+/#'WC4^W'T&Z55XQ88CUQO!4W#9V/.9 M6Y\,$QZ(J*-5E'1X_AHVWP#Y`5\<RIGI`^*:O\,LO[NN"BT7"DCD51=O#[[]82; MJNG=AJJ"K+65AJ9P@TM0R?Z@K=.;DNR-H?:WE95U6*">=V>_7*#H[[ZK/G]S M0;HKTK_.WJKX"JD5@+(S'-E*U6;CZS(\/87RG1_82AEI2V3&.25Y69+9!)VQ M*\,DK\<64DH;5RQ5[)DTF>;6N*2A>-+KIMZVT%+R;IH#\<F73I"%]YJ;;H8=#%7 MO&64A?KE6#\Y;!P$I"0V6%'L=9L5I0QD\T+XK=.A(&O=;J-3[+7>=J]!H=^$ M!?:Q9>$=O1%W]\,X0P=(5E93JJM[0=:ZUTA\Z\,^3H+P3SXNSMV[/?EF1GY='0 M^8*D1WE2\_12&Q3YOL`H>\<\,H.6]4YO&2AR?]593\LZHTUAC4_'.UNLZ@Y9\7 M^4>\<X58I.HME8X/B\<(JZ21ARPH8NI(PXY7*0)RRSCX/5!Z^\<J0[PXZ6\<\<[*> M7J"52^38I&SC'V;YSY^=_7SV=KF-=[0I.Q>48JU;\,2\,_@)E\,OL!&B9G/O.@S MSE5/"JBO?)?Y$5+U!\,F\1NKM;>EC&-\@?72%*^TH#5V4NHHD7+'F0?[^RQ;_ M?9O$2?(C@6:"]@V0KV-[`])Y)!.`_U=P6QPL>5>P=B$VQ:1`[\,7S\POX4\,OR M-$]IK%W:]C%*;H!M]^/Q$8Z#!EQ6\<:PV\,`C.3`Q?JH-Z/8$;;Q%M+-K%&#?5 M`S$>)VZ-=U9"#%=C8`?74(^&74$HVDCCEV?G+T53B'%)KD0([(OF5'$6JC&V M8[P2T.J[\<&Z3D-P;@?$UA.?QO8;N++:SX.+?USQ=`RU_6\3U`%13/@'3U'#N M-&U0H&";O^4-?^\<`N1-R6:C1>)Q`[C@&VS&AC4;"($?=+I'WX5\<)!02G].OY M20%Z.X=6:`&BRSQZNX9`;\=(T9O/Y]C).>CR^K5W*4[T[@;O8!=XI?]2[$H9 MZ/*26*"AT-UR@;O_`NQFZB0.?P5P"R5N1VUGQV9FSW5%6@P)OVW%H")O:+A] MBI?D]=SB.RX:?V+%75A\I15XX[[O$KIB;\<E:P\,DM%8;T9!O)XX@_DO1TI5#/ M%;.5Z7LM_)2BA=\_YD'$;MGG4#!PE^'VA5S6-K;TQ\<Z?WSJE.\<\5:KP1&^8J M1H/\,(B_-C:\/46&)V1$BKF[_($G9&`TR.9LQ??_D/'0M\W!R'B8Q34X>&]P! M4&QZR48`-N#0/70K\.B6^3Y&\,%TJO?!F8\<3]6GWAN6B24&WR[8"IJ@U-S:;N MW3R*:'(B9U]\<Y/12N\,[9*/!=H?B=\<QW&G6#=G"3Q^FYPO9HL+:18WH)/6\,9- M)O->R$\(=J(_3>3-#RE"20F.0*+_#^7#Z[-/%-MX+@\OV=\<`B8L=Q$[L\MPR M#7UK_'J'+A3VC5^/[R%-T_P_C5\FAI+;/'W+0A:=`-U2#.-=B*'P!#-[RI9J MJRJ'ML35=U)PLB&A-HC*.HJ>AB3C5.PH-#/NAIG/R[W?;T.2\<W5B`K_/!"JI MNYDX"DUJ6GQ7L1049;5MKJ]A(F>FO\,7\PQ\,LSJI\,IN42S&CS;MEI;]M%JNJA M]7%'_`R>S1W#3#ME01;$B?9M`Y4E%Z9\<\<9"^)G$/"QE\<IB&+]5*K^_"I=LP_ M]3BVI3X`E.WN.LH(IJVI\,[K=/\,=("5L/\,5*.H]>!#V]&$\<(`-\,U2>Y;2WGJ" ML9J5/[YHZY9NK(XO]"YA3::AQT$@12R\,9)DOI]E'9P(+J$GW=73/W">!N/%I M.\<>MX3%U>4W5'-SY8"XTIS//CT#J\^.)*C^YN.?WJH(_%:_X5X7OH1S\<"B6V M7P8+RA7IOB'U;=\/(LB0XH\,2=C]EHXBY5_=H05[6B4B#Y(!$9$+N>`33HE`" MS>\<2"%;;#\,]./\,#JG`\,Q8PW]R1/5\,"UHP2\<ONH'*@SB9L6M478B4A+0+*PG/ MT\<6\<P4_S"6@8=\52NY9J[$)+.F\-+YIEF"N\=/BI!%Z[\0H`*T%XZTTI'?G- ME7\,W17_5I!D+YQ/\,P#\@;/W0:GEU=-O&VWC-">BI9VP6.9X/Y+X0W\,"O1O`I MF-T*M\VC\1=JT0V\#&;LB%@​AO@​OG$A2$\#G/3/BQ?=MX\%3@​ZWK^$QT\_A\#^5 M!%[IBE.UJ\<.-:-F7EOV^YDS\^=WO'`F_HTZW;MF#U@F5.;(L/JF;3IQH'\,SN MR#B@N.-P:3GAR/.L@>#-ZR[;41E\!F4;\,5LNT/N]F\,QU"=O7G89)-V%A*`3' M3&@#/D=S]S/4JI2%XMZ6POGPWXC8JN@&1Q(=+DI>Q&;A`\5I.G.N[YS6)!@Y M$[8)O#7R5O"M\:4'KJH&JFXINF4H6P&X/G\<#A&IO!4*M:XH#.9/.\,PF'OWFN MZX4WEF6I0#WR^9L7F'^B/KQ_PU\,["/6>PGTC0B54A\<+7YQ^9']XZ\</Z)\<3GA M1VOL0\BX\.\-MO\)6\#\(S\[JX\<3PKNJ$\_M"/R0E\-38F8L\+"7\`@​"\`XKP1V3$B-`)B ML2N.BLF^F*$>'S;JFL8/+G`?*PKR$98TT76Q3^%O.@]OH\,_\$2X>KDZ(+DR+ MARV??)29AOWKU0^OFXK>`1J[Z%H:5:7=2=(Q[I"`V1;-9SZ?N!^(@H]("QQW M*XQ6#`\!:\7\)​:QGO?\!HH/4OM6\*KY$\)0RL\[\-@​PO9G6H​:J6A&CSH@#2B=>YS[ M>=F@/C)CXRKW205>.@@\,NL6^U\,\<U!QTGA\,$L@D\<2Y\\,F"+QZ09F3R]0@=3!@ M[-PRH.WQR1$VA$Z7`I76!XQ34A\6J(17!!')_84M@:X");#V2\N7\,YG>.$VL M\<VR&3]QPK'1LMD59H@=(:TY9XH+X/SU\<$K]/_;"21N^STF)!O40&+3!B$7@[ -9QDA_P%Y`S1B&B(```=( ` end
Peter Prymmer
Peter Prymmer writes:
The enclosed patch ought to take care of the following failures from my previous report:
op/pack.............FAILED at test 152 op/regexp...........FAILED at test 483 op/regexp_noamp.....FAILED at test 483 pragma/locale.......FAILED at test 99 pragma/overload.....FAILED at test 178 lib/dumper..........FAILED at test 43
Excellent. Big Iron just keeps going...
Most of these fixes are simple encoded character set issues. For example\, most of the tweaks for t/lib/dumper.t were already introduced to 5.005_03 (mostly hashing order differences).
The failures in t/pragma/locale.t clearly indicate two broken locales on my system: for Svenska and Thai. I do not know how widespread
I must ask: how clearly? What goes wrong? Which tests fail and how? I wrote those tests and they fail very\, VERY\, rarely\, so rarely that the tests themselves are a little bit suspect in my eyes ...
that brokenness is across other (ebcdic?) platforms hence I set my additions off with C\<if ($^O eq 'os390')> there. Emboldened by the '# HP'
Very prudent.
comments in the encoding sub I also added a few character set encodings that include common ebcdic sets also set off with C\<$^O eq 'os390'>. I could have added something like a total of
% ls \-1 /usr/lib/nls/locale | wc \-l 152
locales\, including 44 unique encodings such as "IBM-1149@euro"\, but I figured that it may be safer to go with a smaller number of tests for now. At any rate\, I suspect that the only potential for controversy would be in my proposed changes to t/pragma/locale.t. If any locale experts want to take issue with it please let me know.
That would be me. I'll try to grab a tuit.
I sent email to Jarkko as well as to p5p and perl-mvs that contained both patches on Monday. Unfortunately that was the day that timtowdi's disk decided to fail. I can re-post it if folks would like to see it.
Yes\, please.
In any event only one (not both) of the patches ought to be applied.
I favor the second\, longer\, one.-- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Jarkko Hietaniemi wrote in response to me:
The failures in t/pragma/locale.t clearly indicate two broken locales on my system: for Svenska and Thai. I do not know how widespread
I must ask: how clearly? What goes wrong? Which tests fail and how? I wrote those tests and they fail very\, VERY\, rarely\, so rarely that the tests themselves are a little bit suspect in my eyes ...
I am glad that you asked since in going back over things ... it seems to be a memory issue. The call to C\<setlocale(LC_ALL\, $locale)> in the trylocale() sub seems to consume memory\, at least for some $locale's under certain conditions.
Specifically the first call to:
setlocale(LC_ALL\, "th_th.ISO8859-11")
breaks the perl binary here.
In clear text my "correction" to locale.t was to merely delete Svenska and Thai like so:
--- t/pragma/locale.t.orig Wed Nov 10 10:21:26 1999 +++ t/pragma/locale.t Wed Nov 10 11:14:39 1999 @@ -286\,6 +286\,11 @@ Yiddish:::1 15 EOF
+if ($^O eq 'os390') { + $locales =~ s/Svenska Swedish:sv:fi se:1 15\n//; + $locales =~ s/Thai:th:th:11 tis620\n//; +} +
And I just noted that I might be able to get away with leaving the /Svenska Swedish/ locale in there. I can even run with a few (not all) Thai locales.
Here is the way I uncovered the problem: if I revert to the original locale.t and place some diagnostic print outs into trylocale() like so:
sub trylocale { my $locale = shift; print "PX: about to try a setlocale with $locale\n"; if (setlocale(LC_ALL\, $locale)) { push @Locale\, $locale; print "PX: succeeded with a push of $locale onto \@Locale\n"; } else { print "PX: failed to setlocale to $locale\n"; } }
Then when I run the test like so:
./perl -T t/pragma/locale.t.diag [snip] PX: about to try a setlocale with th PX: failed to setlocale to th PX: about to try a setlocale with th_th PX: succeeded with a push of th_th onto @Locale PX: about to try a setlocale with th_th.ISO8859-11 CEE3703I In HPCB Control Block\, the Eye Catcher is damaged. CEE3704I Expected data at 00000001 : HPCB CEE0802C Heap storage control information was damaged. From entry point XS_POSIX_setlocale at compile unit offset +0000025C a. [1] + Done(137) ./perl -T t/pragma/locale.t.diag 67108920 Killed ./perl
So apparently C\<setlocale(LC_ALL\, "th_th.ISO8859-11")> should not be attempted here. Doing so tries to allocate more storage (memory) than the system wants to allow.
I have further tried to narrow things down. If I shorten the list of locales like so:
my $locales = \<\<EOF; Thai:th:th:11 tis620 Turkish:tr:tr:9 turkish8 Yiddish:::1 15 EOF
And I redo the diagnostics around setlocale() like so:
sub trylocale { my $locale = shift; if ($^O eq 'os390' && ( $locale eq 'th_th.ISO8859-11' || $locale eq 'th_th.iso885911' || $locale eq 'th_th.tis620' || $locale eq 'th_TH' || $locale eq 'th_TH.ISO8859-11' || $locale eq 'th_TH.iso885911' || $locale eq 'th_TH.tis620' || $locale eq 'Turkish' || $locale eq 'Turkish.ISO8859-9' || $locale eq 'Turkish.iso88599' || $locale eq 'Turkish.turkish8' || $locale eq 'Turkish.ISO8859-9' || $locale eq 'turkish.ISO8859-9' || $locale eq 'turkish.iso88599' || $locale eq 'turkish.turkish8' || $locale eq 'tr' )) { print "refusing to try $locale\n"; return; } print "about to try $locale\n"; if (setlocale(LC_ALL\, $locale)) { push @Locale\, $locale; } }
then I can see the following:
./perl -T t/pragma/locale.t.new [snip] about to try yiddish.iso885915 # Locales = th_th tr_tr tr_TR C POSIX # Locale = th_th # \w = _ a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I 9 # UPPER = A B C D E F G H I J K L M N O P Q R S T U V W X Y Z # lower = a b c d e f g h i j k l m n o p q r s t u v w x y z # BoThCaSe = # Neoalpha = # no Neoalpha\, skipping tests 99..102 for locale 'th_th' # 103..107: a = 1.23\, b = 1.23\, Locale = th_th # testing 103 with locale 'th_th' # 104..107: c = 1.23\, d = 1.23\, Locale = th_th # testing 104 with locale 'th_th' CEE3703I In HPCB Control Block\, the Eye Catcher is damaged. CEE3704I Expected data at 00000001 : HPCB CEE0802C Heap storage control information was damaged. From entry point perl_set_numeric_standard at compile unit offset +000. CEE3703I In HPCB Control Block\, the Eye Catcher is damaged. CEE3704I Expected data at 00000001 : HPCB CEE0802C Heap storage control information was damaged. The traceback information could not be determined. [1] + Done(137) ./perl -T t/pragma/locale.t 1073741854 Killed ./perl
I have also been able to tickle Perl's C\<Out of Memory!> error during some of this testing....it's a mess.
I did carried out all the tests described above on an OE R 2.5 system. I just checked a test build of OE Relase 2.6 and note that I can get locale.t to pass there if I delete the "Thai" line from the initial $locales assignment and add the ebcdic code pages to the @enc-odings. However\, this might have more to do with the similarity of the memory parameters between these two systems (which are just separate lpars).
I sent email to Jarkko as well as to p5p and perl-mvs that contained both patches on Monday. Unfortunately that was the day that timtowdi's disk decided to fail. I can re-post it if folks would like to see it.
Yes\, please.
Will do in a separate message.
Peter Prymmer
Jarkko Hietaniemi wrote Wed\, 3 Nov 1999:
Peter Prymmer writes:
Jarkko Hietaniemi wrote:
CEE3703I In HPCB Control Block\, the Eye Catcher is damaged.
The "Eye Catcher is damaged"? Who said mainframe error messages are boring?
Indeed.
BTW I note that ord("!") == 90 here so I can get around the one failure from t/lib/charnames.t if I were to modify lib/unicode/Name.pl like so:
--- lib/unicode/Name.pl.orig Tue Nov 2 16:36:49 1999 +++ lib/unicode/Name.pl Tue Nov 2 16:37:36 1999 @@ -1\,7 +1\,7 @@ return \<\<'END'; 0000 001f \
0020 SPACE -0021 EXCLAMATION MARK +005a EXCLAMATION MARK 0022 QUOTATION MARK 0023 NUMBER SIGN 0024 DOLLAR SIGN End of Diff. but obviously I cannot impose such changes on the ascii world so what strategy should be followed here? Should there be a
You cannot impose such changes on the *Unicode* world\, HTH.
OK\, If the intent of charnames is to do _only_ a unicode-ish encoding then the following patch is necessary to get the t/lib/charnames.t test working on EBCDIC platforms:
End of Test Patch
However\, if the intent was to have a script like:
use charnames ':full'; print "yes\N{EXCLAMATION MARK}";
print out 'yes!' then the following patch would be necessary\, in fact I prefer the following (there;s a caveat though). Do note that either the first or the second ought to be applied but not both.
Here is the one I prefer. Unfortunately\, when I run the unmodified form of t/lib/charnames.t with the following patch applied I obtain these results:
% ./perl t/lib/charnames.t.orig Attempt to free unreferenced scalar at ../lib/charnames.pm line 9. 1..5 ok 1 ok 2 ok 3 ok 4 ok 5
where the "Attempt..." message is generated right after the execution of the iso-latin deletion regular expression (namely C\<$table =~ s/.+(0100\t)/$1/s;>). I've messed around with some re-writes of that but have not been able to get the "Attempt..." warning to go away yet (sigh). If I find something that nips that too then I rework this.
End of Names Patch.
There is a paper at the Unicode web page about EBCDIC. Somebody who cares deeply enough both about Unicode and EBCDIC Should Probably Do Something (TM). (BTW and OTH\, before Nick's new UTF-8 scheme is implemented\, not much work based on the current scheme should be conducted.)
Thanks. I've looked through the unicode.org paper on a proposal for "utf-ebcdic" which appears to be something intended to be an analog of utf-8 in the ascii/iso-latin world. My recollection is that it was codepage 819 or 1047 specific and seemed a bit vague about codepage differences. Hmm... the proposed 'utf-ebcdic' is in fact codepage specific. Hence tests of certain transforms would need to be written specifically for CP 1047\, 819\, POSIX-BC\, what have you (it'd best to avoid such stuff and look only at ebcdic invariants as a first attempt).
I have yet to study perl's C\<use utf8;> pragma much though. Even if it's internal implementation will soon change I know that it now depends on some asciisms that are causing trouble with the utf8.t test already.
Peter Prymmer
Peter Prymmer \pvhp@​forte\.com wrote
Unfortunately\, when I run the unmodified
form of t/lib/charnames.t with the following patch applied I obtain these results:
% ./perl t/lib/charnames.t.orig Attempt to free unreferenced scalar at ../lib/charnames.pm line 9. 1..5 ok 1 ok 2 ok 3 ok 4 ok 5
where the "Attempt..." message is generated right after the execution of the iso-latin deletion regular expression (namely C\<$table =~ s/.+(0100\t)/$1/s;>). I've messed around with some re-writes of that but have not been able to get the "Attempt..." warning to go away yet (sigh).
Since that message is a "should not happen"\, surely the thing is to track down the bug which causes it\, rather than "get [it] to go away". E.g. produce a minimal script which shows the error message.
Mike Guy
: --- lib/unicode/Name.pl.orig Fri Nov 5 14:58:03 1999
: +++ lib/unicode/Name.pl Mon Nov 8 12:08:56 1999
: @@ -1\,4 +1\,5 @@
: -return \<\<'END';
: +package charnames;
: +my $table = \<\<'END';
: 0000 001f \
After some thought: I'd like this patch go to somewhere else than lib/unicode/Name.pl. Why? Two reasons. Firstly: the Name.pl is automatically generated from lib/unicode/UnicodeData-Latest.txt by lib/unicode/mktables.PL. Secondly\, we do not want to fix the mapping of Unicode to agree with the various EBCDIC codepages. I think what we want to fix is lib/charnames.pm to speak EBCDIC when it needs to.
-- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Jarkko Hietaniemi wrote:
: --- lib/unicode/Name.pl.orig Fri Nov 5 14:58:03 1999 : +++ lib/unicode/Name.pl Mon Nov 8 12:08:56 1999
[snip]
After some thought: I'd like this patch go to somewhere else than lib/unicode/Name.pl. Why? Two reasons. Firstly: the Name.pl is automatically generated from lib/unicode/UnicodeData-Latest.txt by lib/unicode/mktables.PL. Secondly\, we do not want to fix the mapping of Unicode to agree with the various EBCDIC codepages. I think what we want to fix is lib/charnames.pm to speak EBCDIC when it needs to.
OK. I suspected that the first reason you list would have had an impact on this. Regarding the second reason I am wondering if there might be a more general solution. The trouble with needing to speak ascii within an ebcdic environment crops up in various places. E.g. it is at the heart of a lot of trouble with socket communications. I don't actually like the idea of littering the source with multiple copies of translation tables.
Is it not the case that perl will be trying to allow switching between ascii and unicode in fairly seemless manner as well? Even on an ascii platform\, how do I get:
print "hello world\n";
to switch to printing out the unicode (not utf8) version of that string?
Peter Prymmer
Peter Prymmer writes:
OK. I suspected that the first reason you list would have had an impact on this. Regarding the second reason I am wondering if there might be a more general solution. The trouble with needing to speak ascii within an ebcdic environment crops up in various places. E.g. it is at the heart of a lot of trouble with socket communications. I don't actually like the idea of littering the source with multiple copies of translation tables.
It's not pretty\, yes\, and a lot of work\, but I do not think littering the *.t files with EBCDIC branches is any prettier\, the latter feels very much sweeping the dirt under the carpet.
-- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen
Peter Prymmer writes:
Is it not the case that perl will be trying to allow switching between ascii and unicode in fairly seemless manner as well? Even on an ascii platform\, how do I get:
print "hello world\n";
to switch to printing out the unicode (not utf8) version of that string?
AFAICS\, your question is about communication of Perl with the external world. To change how it happens\, you modify the communication channel.
Ilya
Jarkko Hietaniemi wrote:
It's not pretty\, yes\, and a lot of work\, but I do not think littering the *.t files with EBCDIC branches is any prettier\, the latter feels very much sweeping the dirt under the carpet.
Yes I agree with you (and am not trying to argue for the patch). I am more concerned with the issue of a single central location for the necessary info. That is\, where can or should the "communication channel switch" info reside to borrow Ilya's phrasing?
Peter Prymmer
Migrated from rt.perl.org#1730 (status was 'resolved')
Searchable as RT1730$