Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.88k stars 531 forks source link

Not OK: perl 5.00562 on os390 05.00 (UNINSTALLED) #803

Closed p5pRT closed 20 years ago

p5pRT commented 24 years ago

Migrated from rt.perl.org#1730 (status was 'resolved')

Searchable as RT1730$

p5pRT commented 24 years ago

From pvhp@forte.com

Here is a summary of the failures seen during `make test` (apologies for the lousy line wrapping)​:

op/pack.............FAILED at test 152

op/regexp...........CEE5213S The signal SIGPIPE was received. FAILED at test 483 op/regexp_noamp.....CEE5213S The signal SIGPIPE was received. FAILED at test 483

pragma/locale.......CEE3703I In HPCB Control Block\, the Eye Catcher is damaged. CEE3704I Expected data at 00000001 : HPCB CEE0802C Heap storage control information was damaged.   From entry point XS_POSIX_setlocale at compile unit offset +0000025C a. FAILED at test 99 pragma/overload.....CEE5213S The signal SIGPIPE was received. FAILED at test 178

pragma/utf8.........Malformed UTF-8 character at pragma/utf8.t line 22. Malformed UTF-8 character at pragma/utf8.t line 22. Malformed UTF-8 character at pragma/utf8.t line 22. Malformed UTF-8 character at pragma/utf8.t line 22. Malformed UTF-8 character at pragma/utf8.t line 22. Malformed UTF-8 character at pragma/utf8.t line 22. Malformed UTF-8 character at pragma/utf8.t line 22. Malformed UTF-8 character at pragma/utf8.t line 22. Malformed UTF-8 character at pragma/utf8.t line 22. Malformed UTF-8 character at pragma/utf8.t line 22. /([\x{80}-\x{10ffff}])/​: unmatched [] in regexp at pragma/utf8.t line 22. FAILED at test 0 pragma/warnings.....PROG​: # doop.c use warnings 'utf8' ; use utf8 ; $_ = "\x80 \xff" ; chop ; no warnings 'utf8' ; $_ = "\x80 \xff" ; chop ; EXPECTED​: \x80 will produce malformed UTF-8 character; use \x{80} for that at - line 4. \xff will produce malformed UTF-8 character; use \x{ff} for that at - line 4. Malformed UTF-8 character at - line 5. GOT​: Malformed UTF-8 character at - line 4. Malformed UTF-8 character at - line 4. Malformed UTF-8 character at - line 4. Malformed UTF-8 character at - line 4. Malformed UTF-8 character at - line 5. CEE5213S The signal SIGPIPE was received. FAILED at test 14

lib/bigfloat........CEE5213S The signal SIGPIPE was received. FAILED at test 38 lib/bigfltpm........CEE5213S The signal SIGPIPE was received. FAILED at test 354

lib/charnames.......CEE5213S The signal SIGPIPE was received. FAILED at test 1

lib/dumper..........CEE5213S The signal SIGPIPE was received. FAILED at test 43

lib/io_unix.........Can't call method "getline" on an undefined value at lib/io. FAILED at test 3

Failed 12 test scripts out of 217\, 94.47% okay. u=8.05 s=2.68 cu=148.98 cs=49.66 scripts=217 tests=10526

Perl Info ``` Site configuration information for perl 5.00562: Configured by PVHP at Tue Nov 2 11:33:26 PST 1999. Summary of my perl5 (revision 5.0 version 5 subversion 62) configuration: Platform: osname=os390, osvers=05.00, archname=os390 uname='os390 lpar23 05.00 02 9672 ' config_args='-des' hint=recommended, useposix=true, d_sigaction=define usethreads=undef useperlio=undef d_sfio=undef use64bits=undef usemultiplicity=undef Compiler: cc='c89', optimize=' ', gccversion= cppflags='' ccflags ='-DMAXSIG=38 -DOEMVS -D_OE_SOCKETS -D_XOPEN_SOURCE_EXTENDED -D_ALL_SOURCE -DYYDYNAMIC -I/usr/local/include' stdchar='char', d_stdstdio=undef, usevfork=false intsize=4, longsize=4, ptrsize=4, doublesize=8 d_longlong=undef, longlongsize=, d_longdbl=define, longdblsize=16 alignbytes=8, usemymalloc=n, prototype=define Linker and Libraries: ld='ld', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lm -lc libc=, so=a, useshrplib=false, libperl=libperl.a Dynamic Linking: dlsrc=dl_none.xs, dlext=none, d_dlsymun=undef, ccdlflags='' cccdlflags='-W 0,dll,"langlvl(extended)"', lddlflags='' Locally applied patches: @INC for perl 5.00562: lib /usr/local/lib/perl5/5.00562/os390 /usr/local/lib/perl5/5.00562 /usr/local/lib/site_perl/5.00562/os390 /usr/local/lib/site_perl . Environment for perl 5.00562: HOME=/usr/forte/pvhp LANG=C LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/usr/local/bin:/bin:.:/usr/bin PERL_BADLANG (unset) SHELL=/bin/sh ```
p5pRT commented 24 years ago

From @jhi

CEE3703I In HPCB Control Block\, the Eye Catcher is damaged.

The "Eye Catcher is damaged"? Who said mainframe error messages are boring?

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Jarkko Hietaniemi wrote​:

CEE3703I In HPCB Control Block\, the Eye Catcher is damaged.

The "Eye Catcher is damaged"? Who said mainframe error messages are boring?

Indeed.

BTW I note that ord("!") == 90 here so I can get around the one failure from t/lib/charnames.t if I were to modify lib/unicode/Name.pl like so​:

Inline Patch ```diff --- lib/unicode/Name.pl.orig Tue Nov 2 16:36:49 1999 +++ lib/unicode/Name.pl Tue Nov 2 16:37:36 1999 @@ -1,7 +1,7 @@ return <<'END'; 0000 001f 0020 SPACE -0021 EXCLAMATION MARK +005a EXCLAMATION MARK 0022 QUOTATION MARK 0023 NUMBER SIGN 0024 DOLLAR SIGN ```

End of Diff.

but obviously I cannot impose such changes on the ascii world so what strategy should be followed here? Should there be a lib/unicode/Name_ebcdic.pl file that almost replicates lib/unicode/Name.pl? I would not think so since it would have to accomodate the differences between IBM-1047\, 819 and POSIX-BC ebcdic sets and it would mean a great deal of replicated information. Suggestions welcome :-)

Peter Prymmer

p5pRT commented 24 years ago

From @jhi

Peter Prymmer writes​:

Jarkko Hietaniemi wrote​:

CEE3703I In HPCB Control Block\, the Eye Catcher is damaged.

The "Eye Catcher is damaged"? Who said mainframe error messages are boring?

Indeed.

BTW I note that ord("!") == 90 here so I can get around the one failure from t/lib/charnames.t if I were to modify lib/unicode/Name.pl like so​:

--- lib/unicode/Name.pl.orig Tue Nov 2 16​:36​:49 1999 +++ lib/unicode/Name.pl Tue Nov 2 16​:37​:36 1999 @​@​ -1\,7 +1\,7 @​@​ return \<\<'END'; 0000 001f \ 0020 SPACE -0021 EXCLAMATION MARK +005a EXCLAMATION MARK 0022 QUOTATION MARK 0023 NUMBER SIGN 0024 DOLLAR SIGN End of Diff.

but obviously I cannot impose such changes on the ascii world so what strategy should be followed here? Should there be a

You cannot impose such changes on the *Unicode* world\, HTH.

lib/unicode/Name_ebcdic.pl file that almost replicates lib/unicode/Name.pl? I would not think so since it would have to accomodate the differences between IBM-1047\, 819 and POSIX-BC ebcdic sets and it would mean a great deal of replicated information. Suggestions welcome :-)

There is a paper at the Unicode web page about EBCDIC. Somebody who cares deeply enough both about Unicode and EBCDIC Should Probably Do Something (TM). (BTW and OTH\, before Nick's new UTF-8 scheme is implemented\, not much work based on the current scheme should be conducted.)

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

The enclosed patch ought to take care of the following failures from my previous report​:

op/pack.............FAILED at test 152 op/regexp...........FAILED at test 483 op/regexp_noamp.....FAILED at test 483 pragma/locale.......FAILED at test 99 pragma/overload.....FAILED at test 178 lib/dumper..........FAILED at test 43

Most of these fixes are simple encoded character set issues. For example\, most of the tweaks for t/lib/dumper.t were already introduced to 5.005_03 (mostly hashing order differences).

The failures in t/pragma/locale.t clearly indicate two broken locales on my system​: for Svenska and Thai. I do not know how widespread that brokenness is across other (ebcdic?) platforms hence I set my additions off with C\<if ($^O eq 'os390')> there. Emboldened by the '# HP' comments in the encoding sub I also added a few character set encodings that include common ebcdic sets also set off with C\<$^O eq 'os390'>. I could have added something like a total of

  % ls -1 /usr/lib/nls/locale | wc -l   152

locales\, including 44 unique encodings such as "IBM-1149@​euro"\, but I figured that it may be safer to go with a smaller number of tests for now. At any rate\, I suspect that the only potential for controversy would be in my proposed changes to t/pragma/locale.t. If any locale experts want to take issue with it please let me know.

This patch does not address the following failures that were previously reported​:

pragma/utf8.........FAILED at test 0

I suspect there are some lingering asciiisms in toke.c perhaps sv.c that pertain to the utf8 implementation. I've not yet had tuits to look into this.

lib/bigfloat........FAILED at test 38 lib/bigfltpm........FAILED at test 354

These two look nasty since they look like perl is having trouble recognizing floats under certain circumstances (ugh).

lib/charnames.......FAILED at test 1

There are at least two ways to fix this one on ebcdic platforms.
I sent email to Jarkko as well as to p5p and perl-mvs that contained both patches on Monday. Unfortunately that was the day that timtowdi's disk decided to fail. I can re-post it if folks would like to see it. In any event only one (not both) of the patches ought to be applied.
I favor the second\, longer\, one.

lib/io_unix.........FAILED at test 3

This one appears to result from C\<$sock->accept();> returning an undef value near line 65 of t/lib/io_unix.t. Since that appears to be a wrapper around perl's builtin C\<accept()> I am still stumped. This might indicate a system tcp/ip stack config issue(???).

Alright here is the unified diff (which must be applied with a -u capable patch program and that precludes OE's /bin/diff). It has been tested to cause no problems to `make test` on a T64 Unix build done with `sh Configure -des`. On OS/390 my current "97.24% okay" result does not even include the fix to lib/charnames.t​:

begin 644 62.patch.gz M'XL(")'F*3@​``S8N\<`#E6?]WVD82_]G\%7.@​!+`@​Z`L2(#\<N;IJ\I*]I[F+? MI7T6\1-HL77&$D$B\<4+IWWXSNY*00&!(FQ_NCN>GE32S\,Z.9S\SL>EUO/(;F M;`Y3-IL83Q3%N#*U)\'\,NVY%K8DW;+GS.R0]B7(\,​:[12L]E\4\,#1^=R'\<S8% M\,$!5+?S3-5![O5Y)EN5=TH_>\,1=^"3Z"JH"J64;;4CIB8K\/S5[#!+G7Z$*_ M7X)E"4HP#QG\Z$2.9?W()9R49'KU+/#'WC4^W'T&Z55XQ88CUQO!4W#9V/.9 M6Y\,$QZ(J*-5E'1X_AHVWP#Y`5\<RIGI`^*​:O\,LO[NN"BT7"DCD51=O#[[]82; MJNG=AJJ"K+65AJ9P@​TM0R?Z@​K=.;DNR-H?​:WE95U6*">=V>_7*#H[[ZK/G]S M0;HKTK_.WJKX"JD5@​+(S'-E*U6;CZS(\/87RG1_82AEI2V3&.25Y69+9!)VQ M*\,DK\<64DH;5RQ5[)DTF>;6N*2A>-+KIMZVT%+R;IH#\<F73I"%]YJ;;H8=#%7 MO&64A?KE6#\Y;!P$I"0V6%'L=9L5I0QD\T+XK=.A(&O=;J-3[+7>=J]!H=^$ M!?​:Q9>$=O1%W]\,X0P=(5E93JJM[0=​:ZUTA\Z\,^3H+P3SXNSMV[/?EF1GY='0 M^8*D1WE2\_12&Q3YOL`H>\<\,H.6]4YO&2AR?]593\LZHTUAC4_'.UNLZ@​Y9\7 M^4>\<X58I.HME8X/B\<(JZ21ARPH8NI(PXY7*0)RRSCX/5!Z^\<J0[PXZ6\<\<[*> M7J"52^38I&SC'V;YSY^=_7SV=KF-=[0I.Q>48JU;\,2\,_@​)E\,OL!&B9G/O.@​S MSE5/"JBO?)?Y$5+U!\,F\1NKM;>EC&-\@​?72%*^TH#5V4NHHD7+'F0?[^RQ;_ M?9O$2?(C@​6​:"]@​V0KV-[`])Y)!.`_U=P6QPL>5>P=B$VQ​:1`[\,7S\POX4\,OR M-$]IK%W​:]C%*;H!M]^/Q$8Z#!EQ6\<​:PV\,`C.3`Q?JH-Z/8$;;Q%M+-K%&#?5 M`S$>)VZ-=U9"#%=C8`?74(^&74$HVDCCEV?G+T53B'%)KD0([(OF5'$6JC&V M8[P2T.J[\<&Z3D-P;@​?$UA.?QO8;N++​:SX.+?USQ=`RU_6\3U`%13/@​'3U'#N M-&U0H&";O^4-?^\<`N1-R6​:C1>)Q`[C@​&VS&AC4;"($?=+I'WX5\<)!02G].OY M20%Z.X=6​:`&BRSQZNX9`;\=(T9O/Y]C).>CR^K5W*4[T[@​;O8!=XI?]2[$H9 MZ/*26*"AT-UR@​;O_`NQFZB0.?P5P"R5N1VUGQV9FSW5%6@​P)OVW%H")O​:+A] MBI?D]=SB.RX​:?V+%75A\I15XX[[O$KIB;\<E​:P\,DM%8;T9!O)XX@​_DO1TI5#/ M%;.5Z7LM_)2BA=\_YD'$;MGG4#!PE^'VA5S6-K;TQ\<Z?WSJE.\<\5​:KP1&^8J M1H/\,(B_-C​:\/46&)V1$BKF[_($G9&`TR.9LQ??_D/'0M\W!R'B8Q34X>&]P! M4&QZR48`-N#0/70K\.B6^3Y&\,%TJO?!F8\<3]6GWAN6B24&WR[8"IJ@​U-S​:;N MW3R*​:'(B9U]\<Y/12N\,[9*/!=H?B=\<QW&G6#=G"3Q^FYPO9HL+​:18WH)/6\,9- M)O->R$\(=J(_3>3-#RE"20F.0*+_#^7#Z[-/%-MX+@​\OV=\<`B8L=Q$[L\MPR M#7UK_'J'+A3VC5^/[R%-T_P_C5\FAI+;/'W+0A​:=`-U2#.-=B*'P!#-[RI9J MJRJ'ML35=U)PLB&A-HC*.HJ>AB3C5.PH-#/NAIG/R[W?;T.2\<W5B`K_/!"JI MNYDX"DUJ6GQ7L1049;5MKJ]A(F>FO\,7\PQ\,LSJI\,IN42S&CS;MEI;]M%JNJA M]7%'_`R>S1W#3#ME01;$B?9M`Y4E%Z9\<\<9"^)G$/"QE\<IB&+]5*K^_"I=LP_ M]3BVI3X`E.WN.LH(IJVI\,[K=/\,=("5L/\,5*.H]>!#V]&$\<(`-\,U2>Y;2WGJ" ML9J5/[YHZY9NK(XO]"YA3​::AQT$@​12R\,9)DOI]E'9P(+J$GW=73/W">!N/%I M.\<>MX3%U>4W5'-SY8"XTIS//CT#J\^.)*C^YN.?WJH(_%​:_X5X7OH1S\<"B6V M7P8+RA7IOB'U;=\/(LB0XH\,2=C]EHXBY5_=H05[6B4B#Y(!$9$+N>`33HE`" MS>\<2"%;;#\,]./\,#JG`\,Q8PW]R1/5\,"UHP2\<ONH'*@​SB9L6M478B4A+0+*PG/ MT\<6\<P4_S"6@​8=\52NY9J[$)+.F\-+YIEF"N\=/BI!%Z[\0H`*T%XZTTI'?G- ME7\,W17_5I!D+YQ/\,P#\@​;/W0​:GEU=-O&VWC-">BI9VP6.9X/Y+X0W\,"O1O`I MF-T*M\VC\1=JT0V\#&;LB%@&#8203;AO@&#8203;OG$A2$\#G/3/BQ?=MX\%3@&#8203;ZWK^$QT\_A\#^5 M!%[IBE.UJ\<.-​:-F7EOV^YDS\^=WO'`F_HTZW;MF#U@​F5.;(L/JF;3IQH'\,SN MR#B@​N.-P​:3GAR/.L@​>#-ZR[;41E\!F4;\,5LNT/N]F\,QU"=O7G89)-V%A*`3' M3&@​#/D=S]S/4JI2%XMZ6POGPWXC8JN@​&1Q(=+DI>Q&;A`\5I.G.N[YS6)!@​Y M$[8)O#7R5O"M\​:4'KJH&JFXINF4H6P&X/G\<#A&IO!4*M​:XH#.9/.\,PF'OWFN MZX4WEF6I0#WR^9L7F'^B/KQ_PU\,["/6>PGTC0B54A\<+7YQ^9']XZ\</Z)\<3GA M1VOL0\BX\.\-MO\)6\#\(S\[JX\<3PKNJ$\_M"/R0E\-38F8L\+"7\`@&#8203;"\`XKP1V3$B-`)B ML2N.BLF^F*$>'S;JFL8/+G`?*PKR$98TT76Q3^%O.@​]OH\,_\$2X>KDZ(+DR+ MARV??)29AOWKU0^OFXK>`1J[Z%H​:5​:7=2=(Q[I"`V1;-9SZ?N!^(@​H]("QQW M*XQ6#`\!​:\7\)&#8203;:QGO?\!HH/4OM6\*KY$\)0RL\[\-@&#8203;PO9G6H&#8203;:J6A&CSH@​#2B=>YS[ M>=F@​/C)CXRKW205>.@​@​\,NL6^U\,\<U!QTGA\,$L@​D\<2Y\\,F"+QZ09F3R]0@​=3!@​ M[-PRH.WQR1$VA$Z7`I76!XQ34A\6J(17!!')_84M@​​:X");#V2\N7\,YG>.$VL M\<VR&3]QPK'1LMD59H@​=(​:TY9XH+X/SU\<$K]/_;"21N^STF)!O40&+3!B$7@​[ -9QDA_P%Y`S1B&B(```=( ` end

Peter Prymmer

p5pRT commented 24 years ago

From @jhi

Peter Prymmer writes​:

The enclosed patch ought to take care of the following failures from my previous report​:

op/pack.............FAILED at test 152 op/regexp...........FAILED at test 483 op/regexp_noamp.....FAILED at test 483 pragma/locale.......FAILED at test 99 pragma/overload.....FAILED at test 178 lib/dumper..........FAILED at test 43

Excellent. Big Iron just keeps going...

Most of these fixes are simple encoded character set issues. For example\, most of the tweaks for t/lib/dumper.t were already introduced to 5.005_03 (mostly hashing order differences).

The failures in t/pragma/locale.t clearly indicate two broken locales on my system​: for Svenska and Thai. I do not know how widespread

I must ask​: how clearly? What goes wrong? Which tests fail and how? I wrote those tests and they fail very\, VERY\, rarely\, so rarely that the tests themselves are a little bit suspect in my eyes ...

that brokenness is across other (ebcdic?) platforms hence I set my additions off with C\<if ($^O eq 'os390')> there. Emboldened by the '# HP'

Very prudent.

comments in the encoding sub I also added a few character set encodings that include common ebcdic sets also set off with C\<$^O eq 'os390'>. I could have added something like a total of

% ls \-1 /usr/lib/nls/locale | wc \-l
152

locales\, including 44 unique encodings such as "IBM-1149@​euro"\, but I figured that it may be safer to go with a smaller number of tests for now. At any rate\, I suspect that the only potential for controversy would be in my proposed changes to t/pragma/locale.t. If any locale experts want to take issue with it please let me know.

That would be me. I'll try to grab a tuit.

I sent email to Jarkko as well as to p5p and perl-mvs that contained both patches on Monday. Unfortunately that was the day that timtowdi's disk decided to fail. I can re-post it if folks would like to see it.

Yes\, please.

In any event only one (not both) of the patches ought to be applied.
I favor the second\, longer\, one.

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Jarkko Hietaniemi wrote in response to me​:

The failures in t/pragma/locale.t clearly indicate two broken locales on my system​: for Svenska and Thai. I do not know how widespread

I must ask​: how clearly? What goes wrong? Which tests fail and how? I wrote those tests and they fail very\, VERY\, rarely\, so rarely that the tests themselves are a little bit suspect in my eyes ...

I am glad that you asked since in going back over things ... it seems to be a memory issue. The call to C\<setlocale(LC_ALL\, $locale)> in the trylocale() sub seems to consume memory\, at least for some $locale's under certain conditions.

Specifically the first call to​:

  setlocale(LC_ALL\, "th_th.ISO8859-11")  
breaks the perl binary here.

In clear text my "correction" to locale.t was to merely delete Svenska and Thai like so​:

--- t/pragma/locale.t.orig Wed Nov 10 10​:21​:26 1999 +++ t/pragma/locale.t Wed Nov 10 11​:14​:39 1999 @​@​ -286\,6 +286\,11 @​@​   Yiddish​::​:1 15   EOF

+if ($^O eq 'os390') { + $locales =~ s/Svenska Swedish​:sv​:fi se​:1 15\n//; + $locales =~ s/Thai​:th​:th​:11 tis620\n//; +} +

And I just noted that I might be able to get away with leaving the /Svenska Swedish/ locale in there. I can even run with a few (not all) Thai locales.

Here is the way I uncovered the problem​: if I revert to the original locale.t and place some diagnostic print outs into trylocale() like so​:

sub trylocale {   my $locale = shift; print "PX​: about to try a setlocale with $locale\n";   if (setlocale(LC_ALL\, $locale)) {   push @​Locale\, $locale; print "PX​: succeeded with a push of $locale onto \@​Locale\n";   }   else { print "PX​: failed to setlocale to $locale\n";   } }

Then when I run the test like so​:

  ./perl -T t/pragma/locale.t.diag [snip]   PX​: about to try a setlocale with th   PX​: failed to setlocale to th   PX​: about to try a setlocale with th_th   PX​: succeeded with a push of th_th onto @​Locale   PX​: about to try a setlocale with th_th.ISO8859-11   CEE3703I In HPCB Control Block\, the Eye Catcher is damaged.   CEE3704I Expected data at 00000001 : HPCB   CEE0802C Heap storage control information was damaged.   From entry point XS_POSIX_setlocale at compile unit offset +0000025C a.   [1] + Done(137) ./perl -T t/pragma/locale.t.diag   67108920 Killed ./perl

So apparently C\<setlocale(LC_ALL\, "th_th.ISO8859-11")> should not be attempted here. Doing so tries to allocate more storage (memory) than the system wants to allow.

I have further tried to narrow things down. If I shorten the list of locales like so​:

my $locales = \<\<EOF; Thai​:th​:th​:11 tis620 Turkish​:tr​:tr​:9 turkish8 Yiddish​::​:1 15 EOF

And I redo the diagnostics around setlocale() like so​:

sub trylocale {   my $locale = shift;   if ($^O eq 'os390' && (   $locale eq 'th_th.ISO8859-11' ||   $locale eq 'th_th.iso885911' ||   $locale eq 'th_th.tis620' ||   $locale eq 'th_TH' ||   $locale eq 'th_TH.ISO8859-11' ||   $locale eq 'th_TH.iso885911' ||   $locale eq 'th_TH.tis620' ||   $locale eq 'Turkish' ||   $locale eq 'Turkish.ISO8859-9' ||   $locale eq 'Turkish.iso88599' ||   $locale eq 'Turkish.turkish8' ||   $locale eq 'Turkish.ISO8859-9' ||   $locale eq 'turkish.ISO8859-9' ||   $locale eq 'turkish.iso88599' ||   $locale eq 'turkish.turkish8' ||   $locale eq 'tr'   )) {   print "refusing to try $locale\n";   return;   }   print "about to try $locale\n";   if (setlocale(LC_ALL\, $locale)) {   push @​Locale\, $locale;   } }

then I can see the following​:

./perl -T t/pragma/locale.t.new [snip] about to try yiddish.iso885915 # Locales = th_th tr_tr tr_TR C POSIX # Locale = th_th # \w = _ a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I 9 # UPPER = A B C D E F G H I J K L M N O P Q R S T U V W X Y Z # lower = a b c d e f g h i j k l m n o p q r s t u v w x y z # BoThCaSe = # Neoalpha = # no Neoalpha\, skipping tests 99..102 for locale 'th_th' # 103..107​: a = 1.23\, b = 1.23\, Locale = th_th # testing 103 with locale 'th_th' # 104..107​: c = 1.23\, d = 1.23\, Locale = th_th # testing 104 with locale 'th_th' CEE3703I In HPCB Control Block\, the Eye Catcher is damaged. CEE3704I Expected data at 00000001 : HPCB CEE0802C Heap storage control information was damaged.   From entry point perl_set_numeric_standard at compile unit offset +000. CEE3703I In HPCB Control Block\, the Eye Catcher is damaged. CEE3704I Expected data at 00000001 : HPCB CEE0802C Heap storage control information was damaged.   The traceback information could not be determined. [1] + Done(137) ./perl -T t/pragma/locale.t   1073741854 Killed ./perl

I have also been able to tickle Perl's C\<Out of Memory!> error during some of this testing....it's a mess.

I did carried out all the tests described above on an OE R 2.5 system. I just checked a test build of OE Relase 2.6 and note that I can get locale.t to pass there if I delete the "Thai" line from the initial $locales assignment and add the ebcdic code pages to the @​enc-odings. However\, this might have more to do with the similarity of the memory parameters between these two systems (which are just separate lpars).

I sent email to Jarkko as well as to p5p and perl-mvs that contained both patches on Monday. Unfortunately that was the day that timtowdi's disk decided to fail. I can re-post it if folks would like to see it.

Yes\, please.

Will do in a separate message.

Peter Prymmer

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Jarkko Hietaniemi wrote Wed\, 3 Nov 1999​:

Peter Prymmer writes​:

Jarkko Hietaniemi wrote​:

CEE3703I In HPCB Control Block\, the Eye Catcher is damaged.

The "Eye Catcher is damaged"? Who said mainframe error messages are boring?

Indeed.

BTW I note that ord("!") == 90 here so I can get around the one failure from t/lib/charnames.t if I were to modify lib/unicode/Name.pl like so​:

--- lib/unicode/Name.pl.orig Tue Nov 2 16​:36​:49 1999 +++ lib/unicode/Name.pl Tue Nov 2 16​:37​:36 1999 @​@​ -1\,7 +1\,7 @​@​ return \<\<'END'; 0000 001f \ 0020 SPACE -0021 EXCLAMATION MARK +005a EXCLAMATION MARK 0022 QUOTATION MARK 0023 NUMBER SIGN 0024 DOLLAR SIGN End of Diff.

but obviously I cannot impose such changes on the ascii world so what strategy should be followed here? Should there be a

You cannot impose such changes on the *Unicode* world\, HTH.

OK\, If the intent of charnames is to do _only_ a unicode-ish encoding then the following patch is necessary to get the t/lib/charnames.t test working on EBCDIC platforms​:

Inline Patch ```diff --- t/lib/charnames.t.orig Tue Nov 2 16:08:56 1999 +++ t/lib/charnames.t Mon Nov 8 12:06:05 1999 @@ -12,7 +12,7 @@ use charnames ':full'; -print "not " unless "Here\N{EXCLAMATION MARK}?" eq 'Here!?'; +print "not " unless "Here\N{EXCLAMATION MARK}?" eq "Here\041?"; print "ok 1\n"; print "# \$res=$res \$\@='$@'\nnot " ```

End of Test Patch

However\, if the intent was to have a script like​:

use charnames '​:full'; print "yes\N{EXCLAMATION MARK}";

print out 'yes!' then the following patch would be necessary\, in fact I prefer the following (there;s a caveat though). Do note that either the first or the second ought to be applied but not both.

Here is the one I prefer. Unfortunately\, when I run the unmodified form of t/lib/charnames.t with the following patch applied I obtain these results​:

  % ./perl t/lib/charnames.t.orig   Attempt to free unreferenced scalar at ../lib/charnames.pm line 9.   1..5   ok 1   ok 2   ok 3   ok 4   ok 5

where the "Attempt..." message is generated right after the execution of the iso-latin deletion regular expression (namely C\<$table =~ s/.+(0100\t)/$1/s;>). I've messed around with some re-writes of that but have not been able to get the "Attempt..." warning to go away yet (sigh). If I find something that nips that too then I rework this.

Inline Patch ```diff --- lib/unicode/Name.pl.orig Fri Nov 5 14:58:03 1999 +++ lib/unicode/Name.pl Mon Nov 8 12:08:56 1999 @@ -1,4 +1,5 @@ -return <<'END'; +package charnames; +my $table = <<'END'; 0000 001f 0020 SPACE 0021 EXCLAMATION MARK @@ -10547,3 +10548,247 @@ fffc OBJECT REPLACEMENT CHARACTER fffd REPLACEMENT CHARACTER END + +# are we on an EBCDIC platform? + +if ($^O eq 'os390' || $^O eq 'vmesa' || $^O eq 'os400' || $^O eq 'posix-bc') { + + # remove iso-latin chrs 0000..00ff + + $table =~ s/.+(0100\t)/$1/s; + + # prepend ebcdic chrs from codepage 1047 + + $table = <<"END1047"; +0000 003f +0040 SPACE +005a EXCLAMATION MARK +007f QUOTATION MARK +007b NUMBER SIGN +005b DOLLAR SIGN +006c PERCENT SIGN +0050 AMPERSAND +007d APOSTROPHE +004d LEFT PARENTHESIS +005d RIGHT PARENTHESIS +005c ASTERISK +004e PLUS SIGN +006b COMMA +0060 HYPHEN-MINUS +004b FULL STOP +0061 SOLIDUS +00f0 DIGIT ZERO +00f1 DIGIT ONE +00f2 DIGIT TWO +00f3 DIGIT THREE +00f4 DIGIT FOUR +00f5 DIGIT FIVE +00f6 DIGIT SIX +00f7 DIGIT SEVEN +00f8 DIGIT EIGHT +00f9 DIGIT NINE +007a COLON +005e SEMICOLON +004c LESS-THAN SIGN +007e EQUALS SIGN +006e GREATER-THAN SIGN +006f QUESTION MARK +007c COMMERCIAL AT +00c1 LATIN CAPITAL LETTER A +00c2 LATIN CAPITAL LETTER B +00c3 LATIN CAPITAL LETTER C +00c4 LATIN CAPITAL LETTER D +00c5 LATIN CAPITAL LETTER E +00c6 LATIN CAPITAL LETTER F +00c7 LATIN CAPITAL LETTER G +00c8 LATIN CAPITAL LETTER H +00c9 LATIN CAPITAL LETTER I +00d1 LATIN CAPITAL LETTER J +00d2 LATIN CAPITAL LETTER K +00d3 LATIN CAPITAL LETTER L +00d4 LATIN CAPITAL LETTER M +00d5 LATIN CAPITAL LETTER N +00d6 LATIN CAPITAL LETTER O +00d7 LATIN CAPITAL LETTER P +00d8 LATIN CAPITAL LETTER Q +00d9 LATIN CAPITAL LETTER R +00e2 LATIN CAPITAL LETTER S +00e3 LATIN CAPITAL LETTER T +00e4 LATIN CAPITAL LETTER U +00e5 LATIN CAPITAL LETTER V +00e6 LATIN CAPITAL LETTER W +00e7 LATIN CAPITAL LETTER X +00e8 LATIN CAPITAL LETTER Y +00e9 LATIN CAPITAL LETTER Z +00ad LEFT SQUARE BRACKET +00e0 REVERSE SOLIDUS +00bd RIGHT SQUARE BRACKET +005f CIRCUMFLEX ACCENT +006d LOW LINE +0079 GRAVE ACCENT +0081 LATIN SMALL LETTER A +0082 LATIN SMALL LETTER B +0083 LATIN SMALL LETTER C +0084 LATIN SMALL LETTER D +0085 LATIN SMALL LETTER E +0086 LATIN SMALL LETTER F +0087 LATIN SMALL LETTER G +0088 LATIN SMALL LETTER H +0089 LATIN SMALL LETTER I +0091 LATIN SMALL LETTER J +0092 LATIN SMALL LETTER K +0093 LATIN SMALL LETTER L +0094 LATIN SMALL LETTER M +0095 LATIN SMALL LETTER N +0096 LATIN SMALL LETTER O +0097 LATIN SMALL LETTER P +0098 LATIN SMALL LETTER Q +0099 LATIN SMALL LETTER R +00a2 LATIN SMALL LETTER S +00a3 LATIN SMALL LETTER T +00a4 LATIN SMALL LETTER U +00a5 LATIN SMALL LETTER V +00a6 LATIN SMALL LETTER W +00a7 LATIN SMALL LETTER X +00a8 LATIN SMALL LETTER Y +00a9 LATIN SMALL LETTER Z +00c0 LEFT CURLY BRACKET +004f VERTICAL LINE +00d0 RIGHT CURLY BRACKET +00a1 TILDE +00ff 00ff +0041 NO-BREAK SPACE +00aa INVERTED EXCLAMATION MARK +004a CENT SIGN +00b1 POUND SIGN +009f CURRENCY SIGN +00b2 YEN SIGN +006a BROKEN BAR +00b5 SECTION SIGN +00bb DIAERESIS +00b4 COPYRIGHT SIGN +009a FEMININE ORDINAL INDICATOR +008a LEFT-POINTING DOUBLE ANGLE QUOTATION MARK +00b0 NOT SIGN +00ca SOFT HYPHEN +00af REGISTERED SIGN +00bc MACRON +0090 DEGREE SIGN +008f PLUS-MINUS SIGN +00ea SUPERSCRIPT TWO +00fa SUPERSCRIPT THREE +00be ACUTE ACCENT +00a0 MICRO SIGN +00b6 PILCROW SIGN +00b3 MIDDLE DOT +009d CEDILLA +00da SUPERSCRIPT ONE +009b MASCULINE ORDINAL INDICATOR +008b RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK +00b7 VULGAR FRACTION ONE QUARTER +00b8 VULGAR FRACTION ONE HALF +00b9 VULGAR FRACTION THREE QUARTERS +00ab INVERTED QUESTION MARK +0064 LATIN CAPITAL LETTER A WITH GRAVE +0065 LATIN CAPITAL LETTER A WITH ACUTE +0062 LATIN CAPITAL LETTER A WITH CIRCUMFLEX +0066 LATIN CAPITAL LETTER A WITH TILDE +0063 LATIN CAPITAL LETTER A WITH DIAERESIS +0067 LATIN CAPITAL LETTER A WITH RING ABOVE +009e LATIN CAPITAL LETTER AE +0068 LATIN CAPITAL LETTER C WITH CEDILLA +0074 LATIN CAPITAL LETTER E WITH GRAVE +0071 LATIN CAPITAL LETTER E WITH ACUTE +0072 LATIN CAPITAL LETTER E WITH CIRCUMFLEX +0073 LATIN CAPITAL LETTER E WITH DIAERESIS +0078 LATIN CAPITAL LETTER I WITH GRAVE +0075 LATIN CAPITAL LETTER I WITH ACUTE +0076 LATIN CAPITAL LETTER I WITH CIRCUMFLEX +0077 LATIN CAPITAL LETTER I WITH DIAERESIS +00ac LATIN CAPITAL LETTER ETH +0069 LATIN CAPITAL LETTER N WITH TILDE +00ed LATIN CAPITAL LETTER O WITH GRAVE +00ee LATIN CAPITAL LETTER O WITH ACUTE +00eb LATIN CAPITAL LETTER O WITH CIRCUMFLEX +00ef LATIN CAPITAL LETTER O WITH TILDE +00ec LATIN CAPITAL LETTER O WITH DIAERESIS +00bf MULTIPLICATION SIGN +0080 LATIN CAPITAL LETTER O WITH STROKE +00fd LATIN CAPITAL LETTER U WITH GRAVE +00fe LATIN CAPITAL LETTER U WITH ACUTE +00fb LATIN CAPITAL LETTER U WITH CIRCUMFLEX +00fc LATIN CAPITAL LETTER U WITH DIAERESIS +00ba LATIN CAPITAL LETTER Y WITH ACUTE +00ae LATIN CAPITAL LETTER THORN +0059 LATIN SMALL LETTER SHARP S +0044 LATIN SMALL LETTER A WITH GRAVE +0045 LATIN SMALL LETTER A WITH ACUTE +0042 LATIN SMALL LETTER A WITH CIRCUMFLEX +0046 LATIN SMALL LETTER A WITH TILDE +0043 LATIN SMALL LETTER A WITH DIAERESIS +0047 LATIN SMALL LETTER A WITH RING ABOVE +009c LATIN SMALL LETTER AE +0048 LATIN SMALL LETTER C WITH CEDILLA +0054 LATIN SMALL LETTER E WITH GRAVE +0051 LATIN SMALL LETTER E WITH ACUTE +0052 LATIN SMALL LETTER E WITH CIRCUMFLEX +0053 LATIN SMALL LETTER E WITH DIAERESIS +0058 LATIN SMALL LETTER I WITH GRAVE +0055 LATIN SMALL LETTER I WITH ACUTE +0056 LATIN SMALL LETTER I WITH CIRCUMFLEX +0057 LATIN SMALL LETTER I WITH DIAERESIS +008c LATIN SMALL LETTER ETH +0049 LATIN SMALL LETTER N WITH TILDE +00cd LATIN SMALL LETTER O WITH GRAVE +00ce LATIN SMALL LETTER O WITH ACUTE +00cb LATIN SMALL LETTER O WITH CIRCUMFLEX +00cf LATIN SMALL LETTER O WITH TILDE +00cc LATIN SMALL LETTER O WITH DIAERESIS +00e1 DIVISION SIGN +0070 LATIN SMALL LETTER O WITH STROKE +00dd LATIN SMALL LETTER U WITH GRAVE +00de LATIN SMALL LETTER U WITH ACUTE +00db LATIN SMALL LETTER U WITH CIRCUMFLEX +00dc LATIN SMALL LETTER U WITH DIAERESIS +008d LATIN SMALL LETTER Y WITH ACUTE +008e LATIN SMALL LETTER THORN +00df LATIN SMALL LETTER Y WITH DIAERESIS +$table +END1047 + + chomp($table); + + if ($^O eq 'os400') { + # codepage 1047 -> 819 + $table =~ s/005f( CIRCUMFLEX ACCENT)/00b0$1/; + $table =~ s/00ad( LEFT SQUARE BRACKET)/00ba$1/; + $table =~ s/00b0( NOT SIGN)/005f$1/; + $table =~ s/00ba( LATIN CAPITAL LETTER Y WITH ACUTE)/00ad$1/; + $table =~ s/00bb( DIAERESIS)/00bd$1/; + $table =~ s/00bd( RIGHT SQUARE BRACKET)/00bb$1/; + } + + if ($^O eq 'posix-bc') { + # codepage 1047 -> posix-bc + $table =~ s/00ad( LEFT SQUARE BRACKET)/00bb$1/; + $table =~ s/00e0( REVERSE SOLIDUS)/00bc$1/; + $table =~ s/005f( CIRCUMFLEX ACCENT)/006a$1/; + $table =~ s/0079( GRAVE ACCENT)/004a$1/; + $table =~ s/00c0( LEFT CURLY BRACKET)/00fb$1/; + $table =~ s/00d0( RIGHT CURLY BRACKET)/00fd$1/; + $table =~ s/00a1( TILDE)/00ff$1/; + $table =~ s/00ff 00ff( )/005f 005f$1/; + $table =~ s/004a( CENT SIGN)/00b0$1/; + $table =~ s/006a( BROKEN BAR)/00d0$1/; + $table =~ s/00bb( DIAERESIS)/0079$1/; + $table =~ s/00b0( NOT SIGN)/00ba$1/; + $table =~ s/00bc( MACRON)/00a1$1/; + $table =~ s/00fd( LATIN CAPITAL LETTER U WITH GRAVE)/00e0$1/; + $table =~ s/00fb( LATIN CAPITAL LETTER U WITH CIRCUMFLEX)/00dd$1/; + $table =~ s/00ba( LATIN CAPITAL LETTER Y WITH ACUTE)/00ad$1/; + $table =~ s/00dd( LATIN SMALL LETTER U WITH GRAVE)/00c0$1/; + } +} + +return $table; + ```

End of Names Patch.

There is a paper at the Unicode web page about EBCDIC. Somebody who cares deeply enough both about Unicode and EBCDIC Should Probably Do Something (TM). (BTW and OTH\, before Nick's new UTF-8 scheme is implemented\, not much work based on the current scheme should be conducted.)

Thanks. I've looked through the unicode.org paper on a proposal for "utf-ebcdic" which appears to be something intended to be an analog of utf-8 in the ascii/iso-latin world. My recollection is that it was codepage 819 or 1047 specific and seemed a bit vague about codepage differences. Hmm... the proposed 'utf-ebcdic' is in fact codepage specific. Hence tests of certain transforms would need to be written specifically for CP 1047\, 819\, POSIX-BC\, what have you (it'd best to avoid such stuff and look only at ebcdic invariants as a first attempt).

I have yet to study perl's C\<use utf8;> pragma much though. Even if it's internal implementation will soon change I know that it now depends on some asciisms that are causing trouble with the utf8.t test already.

Peter Prymmer

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Peter Prymmer \pvhp@&#8203;forte\.com wrote

                       Unfortunately\, when I run the unmodified 

form of t/lib/charnames.t with the following patch applied I obtain these results​:

% ./perl t/lib/charnames.t.orig Attempt to free unreferenced scalar at ../lib/charnames.pm line 9. 1..5 ok 1 ok 2 ok 3 ok 4 ok 5

where the "Attempt..." message is generated right after the execution of the iso-latin deletion regular expression (namely C\<$table =~ s/.+(0100\t)/$1/s;>). I've messed around with some re-writes of that but have not been able to get the "Attempt..." warning to go away yet (sigh).

Since that message is a "should not happen"\, surely the thing is to track down the bug which causes it\, rather than "get [it] to go away". E.g. produce a minimal script which shows the error message.

Mike Guy

p5pRT commented 24 years ago

From @jhi

: --- lib/unicode/Name.pl.orig Fri Nov 5 14​:58​:03 1999 : +++ lib/unicode/Name.pl Mon Nov 8 12​:08​:56 1999 : @​@​ -1\,4 +1\,5 @​@​ : -return \<\<'END'; : +package charnames; : +my $table = \<\<'END'; : 0000 001f \ : 0020 SPACE : 0021 EXCLAMATION MARK : @​@​ -10547\,3 +10548\,247 @​@​ : fffc OBJECT REPLACEMENT CHARACTER : fffd REPLACEMENT CHARACTER : END : + : +# are we on an EBCDIC platform?

After some thought​: I'd like this patch go to somewhere else than lib/unicode/Name.pl. Why? Two reasons. Firstly​: the Name.pl is automatically generated from lib/unicode/UnicodeData-Latest.txt by lib/unicode/mktables.PL. Secondly\, we do not want to fix the mapping of Unicode to agree with the various EBCDIC codepages. I think what we want to fix is lib/charnames.pm to speak EBCDIC when it needs to.

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Jarkko Hietaniemi wrote​:

: --- lib/unicode/Name.pl.orig Fri Nov 5 14​:58​:03 1999 : +++ lib/unicode/Name.pl Mon Nov 8 12​:08​:56 1999

[snip]

After some thought​: I'd like this patch go to somewhere else than lib/unicode/Name.pl. Why? Two reasons. Firstly​: the Name.pl is automatically generated from lib/unicode/UnicodeData-Latest.txt by lib/unicode/mktables.PL. Secondly\, we do not want to fix the mapping of Unicode to agree with the various EBCDIC codepages. I think what we want to fix is lib/charnames.pm to speak EBCDIC when it needs to.

OK. I suspected that the first reason you list would have had an impact on this. Regarding the second reason I am wondering if there might be a more general solution. The trouble with needing to speak ascii within an ebcdic environment crops up in various places. E.g. it is at the heart of a lot of trouble with socket communications. I don't actually like the idea of littering the source with multiple copies of translation tables.

Is it not the case that perl will be trying to allow switching between ascii and unicode in fairly seemless manner as well? Even on an ascii platform\, how do I get​:

  print "hello world\n";

to switch to printing out the unicode (not utf8) version of that string?

Peter Prymmer

p5pRT commented 24 years ago

From @jhi

Peter Prymmer writes​:

OK. I suspected that the first reason you list would have had an impact on this. Regarding the second reason I am wondering if there might be a more general solution. The trouble with needing to speak ascii within an ebcdic environment crops up in various places. E.g. it is at the heart of a lot of trouble with socket communications. I don't actually like the idea of littering the source with multiple copies of translation tables.

It's not pretty\, yes\, and a lot of work\, but I do not think littering the *.t files with EBCDIC branches is any prettier\, the latter feels very much sweeping the dirt under the carpet.

-- $jhi++; # http​://www.iki.fi/jhi/   # There is this special biologist word we use for 'stable'.   # It is 'dead'. -- Jack Cohen

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Peter Prymmer writes​:

Is it not the case that perl will be trying to allow switching between ascii and unicode in fairly seemless manner as well? Even on an ascii platform\, how do I get​:

print "hello world\n";

to switch to printing out the unicode (not utf8) version of that string?

AFAICS\, your question is about communication of Perl with the external world. To change how it happens\, you modify the communication channel.

Ilya

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Jarkko Hietaniemi wrote​:

It's not pretty\, yes\, and a lot of work\, but I do not think littering the *.t files with EBCDIC branches is any prettier\, the latter feels very much sweeping the dirt under the carpet.

Yes I agree with you (and am not trying to argue for the patch). I am more concerned with the issue of a single central location for the necessary info. That is\, where can or should the "communication channel switch" info reside to borrow Ilya's phrasing?

Peter Prymmer