Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.88k stars 531 forks source link

Not OK: perl v5.7.0 +DEVEL8042 on os390 05.00 (UNINSTALLED) #2970

Closed p5pRT closed 20 years ago

p5pRT commented 23 years ago

Migrated from rt.perl.org#4880 (status was 'resolved')

Searchable as RT4880$

p5pRT commented 23 years ago

From pvhp@forte.com

The good news is that with this kit the resultant miniperl passes Simon's test suite​:

$ cat simon_mini.sh #!/bin/sh ./miniperl -e '$a=v4.5; printf("%vd\n"\,$a)' ./miniperl -e '$a=v4.5.300; printf("%vd\n"\,$a)' ./miniperl -e '$a=v4.5.300; chop $a; printf("%vd\n"\,$a)' ./miniperl -le '$a=v4.5.300; print ord while $_ = chop $a' ./miniperl -e '$a=v4.5; $b = chr(4).chr(5); die unless $a eq $b' ./miniperl -e '$a=v4.5.300; $b = chr(4).chr(5).chr(300); die unless $a eq $b' ./miniperl -e '$a=v4.5.300; chop $a; $b = chr(4).chr(5); die unless $a eq $b' ./miniperl -Ilib -e '$a = v300.4.5.300; {use bytes; printf "%vd\n"\,$a}' ./miniperl -Ilib -e '$a= v300; {use bytes; $a.=chr(196).chr(172).chr(4).chr(5)}; printf "%vd"\, $a'

$ ./simon_mini.sh 4.5 4.5.300 4.5 300 5 4 196.172.4.5.196.172 300.300.4.5$

The bad news is that the new strict-ness and $VERSION-ing in the libs renders the kit unbuildable. Here is how the first attempt at `make` now fails​:

./miniperl -w -Ilib -MExporter -e '\<?>' || make minitest make​: [extra.pods] Error 1 (ignored) rm -f lib/re.pm cat ext/re/re.pm > lib/re.pm ./miniperl configpm configpm.tmp sh mv-if-diff configpm.tmp lib/Config.pm   AutoSplitting perl library ./miniperl -Ilib -e 'use AutoSplit; \ autosplit_lib_modules(@​ARGV)' lib/*.pm lib/*/*.pm Global symbol "$VERSION" requires explicit package name at lib/File/Spec.pm lin. Global symbol "@​ISA" requires explicit package name at lib/File/Spec.pm line 16. Compilation failed in require at lib/File/Spec/Functions.pm line 3. BEGIN failed--compilation aborted at lib/File/Spec/Functions.pm line 3. Compilation failed in require at lib/AutoSplit.pm line 9. BEGIN failed--compilation aborted at lib/AutoSplit.pm line 9. Compilation failed in require at -e line 1. BEGIN failed--compilation aborted at -e line 1. make​: *** [preplibrary] Error 255

After #-commenting out most or all of the C\<use strict;> statements in lib/File/Spec.pm\, lib/File/Spec/*.pm\, and lib/ExtUtils/*.pm\, as well as overcoming an apparent problem with the C\<use vars qw();> statement in lib/ExtUtils/MM_Unix.pm by declaring these variables our​:

our $VERSION = '1.12603'; our $Is_OS2 = $^O eq 'os2'; our $Is_Mac = $^O eq 'MacOS'; our $Is_Win32 = $^O eq 'MSWin32'; our $Is_Dos = $^O eq 'dos'; our $Is_VMS = $^O eq 'VMS'; our $Is_PERL_OBJECT = $Config{'ccflags'} =~ /-DPERL_OBJECT/;

the build then gets a bit farther and seems to die (on another use vars problem?)​:

  Making Errno (nonxs) Writing Makefile for Errno make[1]​: Entering directory `/bld3/pvhp/perl/perl/ext/Errno' make[1]​: Leaving directory `/bld3/pvhp/perl/perl/ext/Errno' make[1]​: Entering directory `/bld3/pvhp/perl/perl/ext/Errno' ../../miniperl -I../../lib -I../../lib -I../../lib -I../../lib Errno_pm.PL Errnm Global symbol "$VERSION" requires explicit package name at Errno_pm.PL line 7. BEGIN not safe after errors--compilation aborted at Errno_pm.PL line 57. make[1]​: *** [Errno.pm] Error 255 make[1]​: Leaving directory `/bld3/pvhp/perl/perl/ext/Errno' make​: *** [ext/Errno/pm_to_blib] Error 2

which could be overcome in this build by changing ext/Errno/Errno_pm.PL to have​:

  our $VERSION = "1.111";

after which most of the rest builds and I was able to run `make nokfile`.

`make test` uncovered these problems (this is not an exhaustive list)​:

comp/proto...........FAILED at test 111 comp/redef...........ok comp/require.........String found where operator expected at bleah.pm line 2\, n"   (Might be a runaway multi-line "" string starting on line 1)   (Missing semicolon on previous line?) String found where operator expected at bleah.pm line 1\, near "BpBrBiBnBt "BoBk"   (Do you need to predeclare BpBrBiBnBt?) String found where operator expected at bleah.pm line 1\, near "BpBrBiBnBt "BoBk"   (Do you need to predeclare BpBrBiBnBt?) FAILED at test 13

op/avhv..............Global symbol "@​ISA" requires explicit package name at ../. Compilation failed in require at op/avhv.t line 8. FAILED at test 0

op/bop...............FAILED at test 22

op/concat............FAILED at test 6

op/hashwarn..........Global symbol "@​warnings" requires explicit package name a. BEGIN not safe after errors--compilation aborted at op/hashwarn.t line 17. FAILED at test 0

op/length............FAILED at test 10

op/numconvert........Global symbol "%Config" requires explicit package name at . BEGIN not safe after errors--compilation aborted at op/numconvert.t line 127. FAILED at test 0

op/regmesg...........FAILED at test 33

op/substr............panic​: utf8_length​: unaligned end at op/substr.t line 297. FAILED at test 133

op/taint.............Global symbol "@​ISA" requires explicit package name at ../. Global symbol "$VERSION" requires explicit package name at ../lib/IPC/SysV.pm l. Global symbol "@​EXPORT_OK" requires explicit package name at ../lib/IPC/SysV.pm. Global symbol "$VERSION" requires explicit package name at ../lib/IPC/SysV.pm l. Compilation failed in require at op/taint.t line 29. BEGIN failed--compilation aborted at op/taint.t line 32. FAILED at test 0

op/undef.............Global symbol "@​ISA" requires explicit package name at ../. Compilation failed in require at op/undef.t line 76. FAILED at test 26

op/utf8decode........FAILED at test 10

op/ver...............FAILED at test 6

pragma/constant......Global symbol "@​warnings" requires explicit package name a. Global symbol "@​warnings" requires explicit package name at pragma/constant.t l. Global symbol "@​warnings" requires explicit package name at pragma/constant.t l. Global symbol "@​warnings" requires explicit package name at pragma/constant.t l. Global symbol "@​warnings" requires explicit package name at pragma/constant.t l. Global symbol "@​warnings" requires explicit package name at pragma/constant.t l. Global symbol "@​warnings" requires explicit package name at pragma/constant.t l. BEGIN not safe after errors--compilation aborted at pragma/constant.t line 140. FAILED at test 1 pragma/diagnostics...Global symbol "$Test_Num" requires explicit package name a. BEGIN not safe after errors--compilation aborted at pragma/diagnostics.t line 1. FAILED at test 0 pragma/locale........Bareword "LC_ALL" not allowed while "strict subs" in use a. BEGIN not safe after errors--compilation aborted at pragma/locale.t line 486. FAILED at test 0

pragma/strict........PROG​:

# strict vars - no error use strict 'vars' ; use vars qw( $freddy) ; BEGIN { *freddy = \$joe​::shmoe; } $freddy = 2 ; EXPECTED​:

GOT​: Variable "$freddy" is not imported at - line 6. Global symbol "$freddy" requires explicit package name at - line 6. Execution of - aborted due to compilation errors. PROG​:

# strict vars - no error use strict 'vars' ; use vars qw( $freddy) ; local $abc​::joe ; my $fred ; my $b = \$fred ; $Fred​::ABC = 1 ; $freddy = 2 ; EXPECTED​:

GOT​: Global symbol "$freddy" requires explicit package name at - line 9. Execution of - aborted due to compilation errors. CEE5213S The signal SIGPIPE was received. FAILED at test 62

lib/ansicolor........Global symbol "@​ISA" requires explicit package name at ../. Global symbol "@​EXPORT" requires explicit package name at ../lib/Term/ANSIColor. Global symbol "%EXPORT_TAGS" requires explicit package name at ../lib/Term/ANSI. Global symbol "$VERSION" requires explicit package name at ../lib/Term/ANSIColo. Global symbol "%attributes" requires explicit package name at ../lib/Term/ANSIC. Global symbol "$AUTOLOAD" requires explicit package name at ../lib/Term/ANSICol. Global symbol "%attributes" requires explicit package name at ../lib/Term/ANSIC. Global symbol "$AUTOLOAD" requires explicit package name at ../lib/Term/ANSICol. Global symbol "$AUTOLOAD" requires explicit package name at ../lib/Term/ANSICol. Global symbol "$AUTOLOAD" requires explicit package name at ../lib/Term/ANSICol. Global symbol "%attributes" requires explicit package name at ../lib/Term/ANSIC. Global symbol "%attributes" requires explicit package name at ../lib/Term/ANSIC. Global symbol "$EACHLINE" requires explicit package name at ../lib/Term/ANSICol. Global symbol "$EACHLINE" requires explicit package name at ../lib/Term/ANSICol. Global symbol "$EACHLINE" requires explicit package name at ../lib/Term/ANSICol. Compilation failed in require at lib/ansicolor.t line 16. BEGIN failed--compilation aborted at lib/ansicolor.t line 16. FAILED at test 1

lib/b................CEE5213S The signal SIGPIPE was received. FAILED at test 10

lib/bigfloat.........CEE5213S The signal SIGPIPE was received. FAILED at test 351

lib/bigfltpm.........CEE5213S The signal SIGPIPE was received. FAILED at test 358

lib/cgi-form.........Global symbol "@​ISA" requires explicit package name at ../. Global symbol "@​EXPORT_OK" requires explicit package name at ../lib/CGI/Util.pm. Global symbol "$VERSION" requires explicit package name at ../lib/CGI/Util.pm l. Global symbol "$EBCDIC" requires explicit package name at ../lib/CGI/Util.pm li. Global symbol "$EBCDIC" requires explicit package name at ../lib/CGI/Util.pm li. Global symbol "@​A2E" requires explicit package name at ../lib/CGI/Util.pm line . Global symbol "$EBCDIC" requires explicit package name at ../lib/CGI/Util.pm li. Global symbol "@​A2E" requires explicit package name at ../lib/CGI/Util.pm line . Compilation failed in require at ../lib/CGI.pm line 26. BEGIN failed--compilation aborted at ../lib/CGI.pm line 26. Compilation failed in require at lib/cgi-form.t line 14. BEGIN failed--compilation aborted at lib/cgi-form.t line 14. FAILED at test 1 lib/cgi-function.....Global symbol "@​ISA" requires explicit package name at ../. Global symbol "@​EXPORT_OK" requires explicit package name at ../lib/CGI/Util.pm. Global symbol "$VERSION" requires explicit package name at ../lib/CGI/Util.pm l. Global symbol "$EBCDIC" requires explicit package name at ../lib/CGI/Util.pm li. Global symbol "$EBCDIC" requires explicit package name at ../lib/CGI/Util.pm li. Global symbol "@​A2E" requires explicit package name at ../lib/CGI/Util.pm line . Global symbol "$EBCDIC" requires explicit package name at ../lib/CGI/Util.pm li. Global symbol "@​A2E" requires explicit package name at ../lib/CGI/Util.pm line . Compilation failed in require at ../lib/CGI.pm line 26. BEGIN failed--compilation aborted at ../lib/CGI.pm line 26. Compilation failed in require at lib/cgi-function.t line 15. BEGIN failed--compilation aborted at lib/cgi-function.t line 15. FAILED at test 1 lib/cgi-html.........Global symbol "@​ISA" requires explicit package name at ../. Global symbol "@​EXPORT_OK" requires explicit package name at ../lib/CGI/Util.pm. Global symbol "$VERSION" requires explicit package name at ../lib/CGI/Util.pm l. Global symbol "$EBCDIC" requires explicit package name at ../lib/CGI/Util.pm li. Global symbol "$EBCDIC" requires explicit package name at ../lib/CGI/Util.pm li. Global symbol "@​A2E" requires explicit package name at ../lib/CGI/Util.pm line . Global symbol "$EBCDIC" requires explicit package name at ../lib/CGI/Util.pm li. Global symbol "@​A2E" requires explicit package name at ../lib/CGI/Util.pm line . Compilation failed in require at ../lib/CGI.pm line 26. BEGIN failed--compilation aborted at ../lib/CGI.pm line 26. Compilation failed in require at lib/cgi-html.t line 14. BEGIN failed--compilation aborted at lib/cgi-html.t line 14. FAILED at test 1 lib/cgi-pretty.......Global symbol "@​ISA" requires explicit package name at ../. Global symbol "@​EXPORT_OK" requires explicit package name at ../lib/CGI/Util.pm. Global symbol "$VERSION" requires explicit package name at ../lib/CGI/Util.pm l. Global symbol "$EBCDIC" requires explicit package name at ../lib/CGI/Util.pm li. Global symbol "$EBCDIC" requires explicit package name at ../lib/CGI/Util.pm li. Global symbol "@​A2E" requires explicit package name at ../lib/CGI/Util.pm line . Global symbol "$EBCDIC" requires explicit package name at ../lib/CGI/Util.pm li. Global symbol "@​A2E" requires explicit package name at ../lib/CGI/Util.pm line . Compilation failed in require at ../lib/CGI.pm line 26. BEGIN failed--compilation aborted at ../lib/CGI.pm line 26. Compilation failed in require at ../lib/CGI/Pretty.pm line 11. BEGIN failed--compilation aborted at ../lib/CGI/Pretty.pm line 11. Compilation failed in require at lib/cgi-pretty.t line 14. BEGIN failed--compilation aborted at lib/cgi-pretty.t line 14. FAILED at test 1 lib/cgi-request......Global symbol "@​ISA" requires explicit package name at ../. Global symbol "@​EXPORT_OK" requires explicit package name at ../lib/CGI/Util.pm. Global symbol "$VERSION" requires explicit package name at ../lib/CGI/Util.pm l. Global symbol "$EBCDIC" requires explicit package name at ../lib/CGI/Util.pm li. Global symbol "$EBCDIC" requires explicit package name at ../lib/CGI/Util.pm li. Global symbol "@​A2E" requires explicit package name at ../lib/CGI/Util.pm line . Global symbol "$EBCDIC" requires explicit package name at ../lib/CGI/Util.pm li. Global symbol "@​A2E" requires explicit package name at ../lib/CGI/Util.pm line . Compilation failed in require at ../lib/CGI.pm line 26. BEGIN failed--compilation aborted at ../lib/CGI.pm line 26. Compilation failed in require at lib/cgi-request.t line 14. BEGIN failed--compilation aborted at lib/cgi-request.t line 14. FAILED at test 1 lib/charnames........FAILED at test 12

lib/encode...........Out of memory! Callback called exit at ../lib/unicode/Name.pl line 5584. FAILED at test 0

lib/env-array........Global symbol "@​ISA" requires explicit package name at ../. Compilation failed in require at ../lib/Env.pm line 123. BEGIN failed--compilation aborted at ../lib/Env.pm line 123. Compilation failed in require at lib/env-array.t line 16. BEGIN failed--compilation aborted at lib/env-array.t line 16. FAILED at test 0 lib/env..............Global symbol "@​ISA" requires explicit package name at ../. Compilation failed in require at ../lib/Env.pm line 123. BEGIN failed--compilation aborted at ../lib/Env.pm line 123. Compilation failed in require at lib/env.t line 13. BEGIN failed--compilation aborted at lib/env.t line 13. FAILED at test 0 lib/errno............Global symbol "$VERSION" requires explicit package name at. Global symbol "@​ISA" requires explicit package name at ../lib/Errno.pm line 16. Global symbol "@​EXPORT_OK" requires explicit package name at ../lib/Errno.pm li. Global symbol "%EXPORT_TAGS" requires explicit package name at ../lib/Errno.pm . BEGIN not safe after errors--compilation aborted at ../lib/Errno.pm line 117. Compilation failed in require at lib/errno.t line 10. BEGIN failed--compilation aborted at lib/errno.t line 10. FAILED at test 0

lib/fields...........Global symbol "$DEBUG" requires explicit package name at l. BEGIN not safe after errors--compilation aborted at lib/fields.t line 96. FAILED at test 0

lib/filespec.........Global symbol "$VERSION" requires explicit package name at. Global symbol "@​ISA" requires explicit package name at ../lib/File/Spec/Win32.p. Compilation failed in require at lib/filespec.t line 301. FAILED at test 0 lib/filter-util......FAILED at test 2

lib/ftmp-mktemp......Global symbol "$VERSION" requires explicit package name at. Global symbol "@​ISA" requires explicit package name at ../lib/Errno.pm line 16. Global symbol "@​EXPORT_OK" requires explicit package name at ../lib/Errno.pm li. Global symbol "%EXPORT_TAGS" requires explicit package name at ../lib/Errno.pm . BEGIN not safe after errors--compilation aborted at ../lib/Errno.pm line 117. Compilation failed in require at ../lib/File/Temp.pm line 127. BEGIN failed--compilation aborted at ../lib/File/Temp.pm line 127. Compilation failed in require at lib/ftmp-mktemp.t line 17. BEGIN failed--compilation aborted at lib/ftmp-mktemp.t line 17. FAILED at test 1 lib/ftmp-posix.......Global symbol "$VERSION" requires explicit package name at. Global symbol "@​ISA" requires explicit package name at ../lib/Errno.pm line 16. Global symbol "@​EXPORT_OK" requires explicit package name at ../lib/Errno.pm li. Global symbol "%EXPORT_TAGS" requires explicit package name at ../lib/Errno.pm . BEGIN not safe after errors--compilation aborted at ../lib/Errno.pm line 117. Compilation failed in require at ../lib/File/Temp.pm line 127. BEGIN failed--compilation aborted at ../lib/File/Temp.pm line 127. Compilation failed in require at lib/ftmp-posix.t line 13. BEGIN failed--compilation aborted at lib/ftmp-posix.t line 13. FAILED at test 1 lib/ftmp-security....Global symbol "$VERSION" requires explicit package name at. Global symbol "@​ISA" requires explicit package name at ../lib/Errno.pm line 16. Global symbol "@​EXPORT_OK" requires explicit package name at ../lib/Errno.pm li. Global symbol "%EXPORT_TAGS" requires explicit package name at ../lib/Errno.pm . BEGIN not safe after errors--compilation aborted at ../lib/Errno.pm line 117. Compilation failed in require at ../lib/File/Temp.pm line 127. BEGIN failed--compilation aborted at ../lib/File/Temp.pm line 127. Compilation failed in require at lib/ftmp-security.t line 24. BEGIN failed--compilation aborted at lib/ftmp-security.t line 24. FAILED at test 1 lib/ftmp-tempfile....Global symbol "$VERSION" requires explicit package name at. Global symbol "@​ISA" requires explicit package name at ../lib/Errno.pm line 16. Global symbol "@​EXPORT_OK" requires explicit package name at ../lib/Errno.pm li. Global symbol "%EXPORT_TAGS" requires explicit package name at ../lib/Errno.pm . BEGIN not safe after errors--compilation aborted at ../lib/Errno.pm line 117. Compilation failed in require at ../lib/File/Temp.pm line 127. BEGIN failed--compilation aborted at ../lib/File/Temp.pm line 127. Compilation failed in require at lib/ftmp-tempfile.t line 40. BEGIN failed--compilation aborted at lib/ftmp-tempfile.t line 40. FAILED at test 1

lib/getopt...........Global symbol "$VERSION" requires explicit package name at. Global symbol "@​ISA" requires explicit package name at ../lib/Getopt/Long.pm li. Global symbol "@​EXPORT" requires explicit package name at ../lib/Getopt/Long.pm. Global symbol "%EXPORT_TAGS" requires explicit package name at ../lib/Getopt/Lo. Global symbol "@​EXPORT_OK" requires explicit package name at ../lib/Getopt/Long. BEGIN not safe after errors--compilation aborted at ../lib/Getopt/Long.pm line . Compilation failed in require at lib/getopt.t line 53. BEGIN failed--compilation aborted at lib/getopt.t line 53. FAILED at test 0

...etc. (other failures simply omitted here)

Failed 65 test scripts out of 230\, 71.74% okay.

Perl Info ``` Flags: category=install severity=none Site configuration information for perl v5.7.0: Configured by PVHP at Fri Dec 8 10:10:07 PST 2000. Summary of my perl5 (revision 5.0 version 7 subversion 0) configuration: Platform: osname=os390, osvers=05.00, archname=os390 uname='os390 lpar25 05.00 02 9672 ' config_args='-Dusedevel -des' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=undef d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef Compiler: cc='c89', ccflags ='-DMAXSIG=38 -DOEMVS -D_OE_SOCKETS -D_XOPEN_SOURCE_EXTENDED -D_ALL_SOURCE -DYYDYNAMIC -I/usr/local/include', optimize=' ', cppflags='' ccversion='', gccversion='', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=4321 d_longlong=undef, longlongsize=, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=4 alignbytes=8, usemymalloc=n, prototype=define Linker and Libraries: ld='ld', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lm -lc perllibs=-lm -lc libc=, so=a, useshrplib=false, libperl=libperl.a Dynamic Linking: dlsrc=dl_none.xs, dlext=none, d_dlsymun=undef, ccdlflags='' cccdlflags='-W 0,dll,"langlvl(extended)"', lddlflags='' Locally applied patches: DEVEL8042 @INC for perl v5.7.0: lib /usr/local/lib/perl5/5.7.0/os390 /usr/local/lib/perl5/5.7.0 /usr/local/lib/perl5/site_perl/5.7.0/os390 /usr/local/lib/perl5/site_perl/5.7.0 /usr/local/lib/perl5/site_perl/5.005/os390 /usr/local/lib/perl5/site_perl/5.005 /usr/local/lib/perl5/site_perl . Environment for perl v5.7.0: HOME=/home/pvhp LANG=C LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/usr/local/bin:/bin:.:/usr/bin:/usr/lpp/java/J1.1/bin PERL_BADLANG (unset) SHELL=/bin/sh ```
p5pRT commented 23 years ago

From @simoncozens

On Fri\, Dec 08\, 2000 at 11​:54​:30AM -0800\, Peter Prymmer wrote​:

The good news is that with this kit the resultant miniperl passes Simon's test suite​:

Is this with or without my a2e patch? I hope it is\, because if so\, I think we've pretty much cracked it; (if it's not\, then something miraculous must have happened) I now know how to get the versioning working\, and a couple of other spots that don't act as they should.

Then we have to patch Nick's brain. :)

p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

On Fri\, 8 Dec 2000\, Simon Cozens wrote​:

On Fri\, Dec 08\, 2000 at 11​:54​:30AM -0800\, Peter Prymmer wrote​:

The good news is that with this kit the resultant miniperl passes Simon's test suite​:

Is this with or without my a2e patch? I hope it is\, because if so\, I think we've pretty much cracked it; (if it's not\, then something miraculous must have happened) I now know how to get the versioning working\, and a couple of other spots that don't act as they should.

Alas it does not have a2e nor e2a. But change # 8039 does mention a change of yours​:

[ 8039] By​: jhi on 2000/12/08 15​:57​:11   Log​: Subject​: [PATCH] Re​: ebcdic \<-> ascii tables interjected in uv \<-> utf8 considered harmful   From​: Simon Cozens \simon@&#8203;cozens\.net   Date​: Fri\, 8 Dec 2000 13​:33​:31 +0000   Message-ID​:\20001208133331\.A11535@&#8203;deep\-dark\-truthful\-mirror\.perlhacker\.org

  (The pp_hot part needed a rewrite.)   Branch​: perl   ! doop.c pp_hot.c utf8.c

Then we have to patch Nick's brain. :)

As long as we design a self consistent model first ... ;-)

Peter Prymmer

p5pRT commented 23 years ago

From @jhi

A propos​: please see the new UTF8_XXX macros is utf8.h. They should come in handy for abstracting some of these things.

p5pRT commented 23 years ago

From @jhi

On Fri\, Dec 08\, 2000 at 03​:54​:53PM -0600\, Jarkko Hietaniemi wrote​:

A propos​: please see the new UTF8_XXX macros is utf8.h. They should come in handy for abstracting some of these things.

It's not only the 0x80 to watch out\, it's also the 0xc0.

Also\, looking at utf8.h\, the UTF8SKIP() needs to be educated about non-Latin-1 since now it blindly offsets by the character code. (Sorry again\, if this has already been suggested\, my network sucks smelly socks...)

p5pRT commented 23 years ago

From @simoncozens

On Fri\, Dec 08\, 2000 at 04​:19​:48PM -0600\, Jarkko Hietaniemi wrote​:

Also\, looking at utf8.h\, the UTF8SKIP() needs to be educated about non-Latin-1 since now it blindly offsets by the character code.

This educates it\, and it puts in the host-independence macros as well. I really think it's worth doing this.

Inline Patch ```diff --- utf8.h~ Fri Dec 8 22:29:16 2000 +++ utf8.h Fri Dec 8 22:29:33 2000 @@ -58,8 +58,6 @@ #define UNICODE_IS_BYTE_ORDER_MARK(c) ((c) == UNICODE_BYTER_ORDER_MARK) #define UNICODE_IS_ILLEGAL(c) ((c) == UNICODE_ILLEGAL) -#define UTF8SKIP(s) PL_utf8skip[*(U8*)s] - #define UTF8_QUAD_MAX UINT64_C(0x1000000000) #define UTF8_IS_ASCII(c) ((c) < 0x80) @@ -109,3 +107,28 @@ #endif #define isIDFIRST_lazy(p) isIDFIRST_lazy_if(p,1) #define isALNUM_lazy(p) isALNUM_lazy_if(p,1) + +/* Host-character-set-independent UTF8 handling. */ +/* For now, we only handle EBCDIC and Latin-1, and those not EBCDIC are + * assumed to be Latin-1. Soon, the HOST2UNI and UNI2HOST macros will be + * replaced by a call to a function provided by the Encode module so + * that more host character sets can be converted */ + +#ifdef EBCDIC +#define HOST2UNI(ch) PL_e2u[(ch)] +#define UNI2HOST(ch) PL_u2e[(ch)] +#else +#define HOST2UNI(ch) (ch) +#define UNI2HOST(ch) (ch) +#endif + +#define HOST_CHAR_TO_UTF8(ch, s) STMT_START { \ + s = uv_to_utf8(s, HOST2UNI(ch)); \ + } STMT_END +#define CODEPOINT_TO_UTF8(ch, s) STMT_START { \ + s = uv_to_utf8(s, (ch)); \ + } STMT_END +#define UTF8_TO_HOST_CHAR(s, retlen) UNI2HOST(utf8_to_uv_simple((s), (retlen))) +#define UTF8_TO_CODEPOINT(s, retlen) (utf8_to_uv_simple((s),(retlen))) + +#define UTF8SKIP(s) PL_utf8skip[HOST2UNI(*(U8*)s)] ```
p5pRT commented 23 years ago

From @simoncozens

On Fri\, Dec 08\, 2000 at 10​:31​:22PM +0000\, Simon Cozens wrote​:

This educates it\, and it puts in the host-independence macros as well. +#define HOST2UNI(ch) PL_e2u[(ch)] +#define UNI2HOST(ch) PL_u2e[(ch)]

Ah\, it needs the tables as well\, of course.

p5pRT commented 23 years ago

From @jhi

+#ifdef EBCDIC +#define HOST2UNI(ch) PL_e2u[(ch)] +#define UNI2HOST(ch) PL_u2e[(ch)] +#else +#define HOST2UNI(ch) (ch) +#define UNI2HOST(ch) (ch) +#endif + +#define HOST_CHAR_TO_UTF8(ch\, s) STMT_START { \ + s = uv_to_utf8(s\, HOST2UNI(ch)); \ + } STMT_END +#define CODEPOINT_TO_UTF8(ch\, s) STMT_START { \ + s = uv_to_utf8(s\, (ch)); \ + } STMT_END +#define UTF8_TO_HOST_CHAR(s\, retlen) UNI2HOST(utf8_to_uv_simple((s)\, (retlen))) +#define UTF8_TO_CODEPOINT(s\, retlen) (utf8_to_uv_simple((s)\,(retlen))) + +#define UTF8SKIP(s) PL_utf8skip[HOST2UNI(*(U8*)s)]

HOST2UNI() is funnily named if what it does is to convert the host codepoint to UTF-8 encoded (Unicode\, assumedly?)\, ditto UNI2HOST. So we still have terminology confusion.

\<trawling around the Unicode time\, my interpretation follows>

(1) There are /character codes/. Unicode character codes are 16 bits wide.   I assume ISO 8859-1 and EBCDIC can be described as character codes   8 bits wide. Yes\, the Unicode 16-bitness does include surrogates;   a character code does not have to consist of a single code value.

  The character codes are abstract. They are numbers\, a mapping   of a /character repertoire/ to a set of integers. As an example   of the repertoire; the Unicode U+0041 is 'A' -- but it does not   mean that any /encoding/ *has* to have 'A' at the 41th (zero-based)   position.

(2) There are /encodings/ of character codes. /Character encoding forms/   specify the representation of characters as actual data in a computer.   Unicode uses two encoding forms\, UTF-8 and UTF-16\, the latter being   in fact three\, since UTF-16 as such makes sense only in execution time\,   "16-bit values"\, for any storage the exact encoding needs to spelt out   as UTF-16BE or UTF16-LE.

My additional notes​:

(3) The character codes can to some degree at least partially be mapped   amongst themselves. (Unless they are just permutations of each other\,   there is guaranteed to be some loss of information.)

(4) The UTF-8 can encode any integer value\, not just Unicode. Some decodings   are illegal if understood as Unicode\, such as 0xFFFF.

(5) The UTF-16 are almost like just straight 16-bit C ints. What complicates   things are the surrogates. (But at this point\, let's not worry about   surrogates any more than just keep in mind that just 16 bits isn't enough   in the future.)

p5pRT commented 23 years ago

From @jhi

One note more​: tying a 'host' to 'a codeset' is folly\, too. In many platforms one can switch between different codesets.

Incidentally\, let's not use the term "character set"\, that has been over the years been a source of great confusion. So we have 'character codes' aka 'codesets' and we have 'encodings' of those.

p5pRT commented 23 years ago

From @simoncozens

On Sat\, Dec 09\, 2000 at 10​:43​:01AM -0600\, Jarkko Hietaniemi wrote​:

HOST2UNI() is funnily named if what it does is to convert the host codepoint to UTF-8 encoded (Unicode\, assumedly?)\, ditto UNI2HOST. So we still have terminology confusion.

It would be\, if that was what it does. But it doesn't. It converts host codeset codepoints into Unicode codepoints. Otherwise it would be named HOST2UTF.

The character codes are abstract\.  They are numbers\, a mapping
of a /character repertoire/ to a set of integers\.  As an example
of the repertoire; the Unicode U\+0041 is 'A' \-\- but it does not
mean that any /encoding/ \*has\* to have 'A' at the 41th \(zero\-based\)
position\.

Yep\, exactly.

(2) There are /encodings/ of character codes. /Character encoding forms/ specify the representation of characters as actual data in a computer. Unicode uses two encoding forms

And the rest... :)

(4) The UTF-8 can encode any integer value\, not just Unicode. Some decodings are illegal if understood as Unicode\, such as 0xFFFF.

Yes\, and in *some cases* that's what we use them for - comparing version strings\, for instance. In other cases\, we use them as Unicode characters.

The whole point of my patch was to give us a way of distinguishing.

One note more​: tying a 'host' to 'a codeset' is folly\, too. In many platforms one can switch between different codesets.

That's why I said that these macros would eventually become a call to Encode. But since Encode just isn't there yet\, and we have a problem with EBCDIC\, we should at least try and solve that now while maintaining the ability to chance over to Encode later on.

*sigh*. Should I rename them CONVERT_CODEPOINT_IN_USERS_CODESET_TO_UNICODE_CODEPOINT and so on? Would that be clearer?

p5pRT commented 23 years ago

From @jhi

On Sat\, Dec 09\, 2000 at 05​:05​:59PM +0000\, Simon Cozens wrote​:

On Sat\, Dec 09\, 2000 at 10​:43​:01AM -0600\, Jarkko Hietaniemi wrote​:

HOST2UNI() is funnily named if what it does is to convert the host codepoint to UTF-8 encoded (Unicode\, assumedly?)\, ditto UNI2HOST. So we still have terminology confusion.

It would be\, if that was what it does. But it doesn't. It converts host

Ahhh\, sorry\, missed one pointer deref.

codeset codepoints into Unicode codepoints. Otherwise it would be named HOST2UTF.

That's why I said that these macros would eventually become a call to Encode. But since Encode just isn't there yet\, and we have a problem with EBCDIC\, we should at least try and solve that now while maintaining the ability to chance over to Encode later on.

*sigh*. Should I rename them CONVERT_CODEPOINT_IN_USERS_CODESET_TO_UNICODE_CODEPOINT and so on? Would that be clearer?

I think HOST for the (currently 'active') host codeset and UNI for the Unicode codeset is fine. Sorry about the added confusion... :-)

p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

Peter Prymmer \pvhp@&#8203;forte\.com writes​:

Then we have to patch Nick's brain. :)

As long as we design a self consistent model first ... ;-)

If you two can describe a self consistent model I'll take care of patching my brain :-)

p5pRT commented 23 years ago

From @simoncozens

On Sat\, Dec 09\, 2000 at 08​:58​:57PM +0000\, Nick Ing-Simmons wrote​:

If you two can describe a self consistent model I'll take care of patching my brain :-)

Heed the word​:

1. Old byte-oriented programs should not spontaneously break on the old byte oriented data they used to work on.

2. Old byte-oriented program should magically start working on the new character-oriented data when appropriate.

3. Programs should run just as fast in the new character-oriented mode as in the old byte-oriented mode.

4. Perl should remain one language\, rather than forking into a byte-oriented Perl and a character-oriented perl.

For the longer answer\, see pages 401-410. The self-consistent model is all *done*\, Nick. (And if it ain't\, take it up with the man.) Now we just have to make it work.

p5pRT commented 23 years ago

From @nwc10

On Sat\, Dec 09\, 2000 at 09​:17​:01PM +0000\, Simon Cozens wrote​:

1. Old byte-oriented programs should not spontaneously break on the old byte oriented data they used to work on.

2. Old byte-oriented program should magically start working on the new character-oriented data when appropriate.

The self-consistent model is all *done*\, Nick. (And if it ain't\, take it up with the man.) Now we just have to make it work.

IIRC with the problem about /usr/bin/perl -lwe 'print ~"!"' Þ

where ~ can't work utf-wide (rule 2) without breaking old programs (breach rule 1)

and IIRC there was an agreed compromise. it's possibly the only case where use utf8; need not be a no-op (eventually)

Nicholas Clark

p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

Jarkko Hietaniemi \jhi@&#8203;iki\.fi writes​:

HOST2UNI() is funnily named if what it does is to convert the host codepoint to UTF-8 encoded (Unicode\, assumedly?)\, ditto UNI2HOST. So we still have terminology confusion.

The poor lad was just trying to avoid by 'but ASCII is only 0..0x7F' hot-button ;-) -

The output is supposed (I think) to be Unicode code point which _may_ then be UTF-8'ed ...

HOST2ISO8859_1 is a bit of a mouthful.

p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

Simon Cozens \simon@&#8203;cozens\.net writes​:

On Sat\, Dec 09\, 2000 at 08​:58​:57PM +0000\, Nick Ing-Simmons wrote​:

If you two can describe a self consistent model I'll take care of patching my brain :-)

Heed the word​:

1. Old byte-oriented programs should not spontaneously break on the old byte oriented data they used to work on.

I can see (that for a sufficently inclusive definition of "old byte-oriented program") that is the snag with "Perl strings are sequences of UNICODE characters - hence ord('A') == 0x41"
model I proposed.

2. Old byte-oriented program should magically start working on the new character-oriented data when appropriate.

3. Programs should run just as fast in the new character-oriented mode as in the old byte-oriented mode.

tee hee ;-)

4. Perl should remain one language\, rather than forking into a byte-oriented Perl and a character-oriented perl.

I would not disagree with any of that - even if it was allowed ;-)

For the longer answer\, see pages 401-410.

In the morning.

The self-consistent model is all *done*\, Nick. (And if it ain't\, take it up with the man.) Now we just have to make it work.

I still want to see a perl-programmer's view description of how things work on EBCDIC platforms\, but I can wait till you have something that mostly-works if you want to do it bottom-up.

p5pRT commented 23 years ago

From @jhi

On Sat\, Dec 09\, 2000 at 10​:57​:36PM +0000\, Nicholas Clark wrote​:

On Sat\, Dec 09\, 2000 at 09​:17​:01PM +0000\, Simon Cozens wrote​:

1. Old byte-oriented programs should not spontaneously break on the old byte oriented data they used to work on.

2. Old byte-oriented program should magically start working on the new character-oriented data when appropriate.

The self-consistent model is all *done*\, Nick. (And if it ain't\, take it up with the man.) Now we just have to make it work.

IIRC with the problem about /usr/bin/perl -lwe 'print ~"!"' Þ

where ~ can't work utf-wide (rule 2) without breaking old programs (breach rule 1)

and IIRC there was an agreed compromise. it's possibly the only case where

The behaviour of chr($x) for $x (128..255) *may* be another.

use utf8; need not be a no-op (eventually)

Nicholas Clark

p5pRT commented 23 years ago

From @jhi

On Fri\, Dec 08\, 2000 at 10​:31​:22PM +0000\, Simon Cozens wrote​:

On Fri\, Dec 08\, 2000 at 04​:19​:48PM -0600\, Jarkko Hietaniemi wrote​:

Also\, looking at utf8.h\, the UTF8SKIP() needs to be educated about non-Latin-1 since now it blindly offsets by the character code.

This educates it\, and it puts in the host-independence macros as well. I really think it's worth doing this.

--- utf8.h~ Fri Dec 8 22​:29​:16 2000 +++ utf8.h Fri Dec 8 22​:29​:33 2000

I will not put this in as-is before Peter gets a chance to play with it -- and as Simon later pointed out\, we need also to prepare some PL_{e2u\,u2e} tables.

p5pRT commented 23 years ago

From @simoncozens

On Sun\, Dec 10\, 2000 at 11​:34​:23AM -0600\, Jarkko Hietaniemi wrote​:

I will not put this in as-is before Peter gets a chance to play with it -- and as Simon later pointed out\, we need also to prepare some PL_{e2u\,u2e} tables.

Fair enough. I've stealthily sent Peter another patch which distinguishes between numeric-use and character-use of the UTF8 functions. We'll see how that gets on.

Incidentally\, Jarkko\, is your mailer doing something weird to headers? I have to reformat replies to your mails.

p5pRT commented 23 years ago

From @jhi

On Sun\, Dec 10\, 2000 at 07​:11​:31PM +0000\, Simon Cozens wrote​:

On Sun\, Dec 10\, 2000 at 11​:34​:23AM -0600\, Jarkko Hietaniemi wrote​:

I will not put this in as-is before Peter gets a chance to play with it -- and as Simon later pointed out\, we need also to prepare some PL_{e2u\,u2e} tables.

Fair enough. I've stealthily sent Peter another patch which distinguishes between numeric-use and character-use of the UTF8 functions. We'll see how that gets on.

Incidentally\, Jarkko\, is your mailer doing something weird to headers? I have to reformat replies to your mails.

Not that I now of. Elaborate on the nature of weird.

-- "Saving four times is just paranoia. Unless you're using an Exabyte 5gig/8mm tapedrive." - Graham Reed\, ASR

p5pRT commented 23 years ago

From @simoncozens

On Sun\, Dec 10\, 2000 at 03​:06​:47PM -0600\, Jarkko Hietaniemi wrote​:

Not that I now of. Elaborate on the nature of weird.

It's breaking Mail-Followup-To\, thusly​:

Mail-Followup-To​: Jarkko Hietaniemi \jhi@&#8203;iki\.fi\, Simon Cozens   \simon@&#8203;cozens\.net\, Peter Prymmer \pvhp@&#8203;forte\.com\, perl5-porters@​perl.org   References​: \20001208155453\.H3657@&#8203;chaos\.wustl\.edu   \20001208161948\.J3657@&#8203;chaos\.wustl\.edu   \20001208223122\.A1879@&#8203;deep\-dark\-truthful\-mirror\.perlhacker\.org

p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

On Sat\, 9 Dec 2000\, Nicholas Clark wrote​:

On Sat\, Dec 09\, 2000 at 09​:17​:01PM +0000\, Simon Cozens wrote​:

1. Old byte-oriented programs should not spontaneously break on the old byte oriented data they used to work on.

2. Old byte-oriented program should magically start working on the new character-oriented data when appropriate.

The self-consistent model is all *done*\, Nick. (And if it ain't\, take it up with the man.) Now we just have to make it work.

IIRC with the problem about /usr/bin/perl -lwe 'print ~"!"' ???

where ~ can't work utf-wide (rule 2) without breaking old programs (breach rule 1)

and IIRC there was an agreed compromise. it's possibly the only case where use utf8; need not be a no-op (eventually)

Here is what perl 5.005_03 returns on OS/390 V2R5​:

$ perl -lwe 'print ~"!"' | od -c 0000000000 v \n 0000000002

that is\, the letter "v".

BTW This message is something of a test. I've had my email gone since Friday evening or Saturday morning and I may or may not be able to recover (I really want to see if this makes it out to p5p).

Peter Prymmer