Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.93k stars 552 forks source link

perl5.8.0 bug in \U..\E processing #6131

Closed p5pRT closed 21 years ago

p5pRT commented 21 years ago

Migrated from rt.perl.org#18931 (status was 'resolved')

Searchable as RT18931$

p5pRT commented 21 years ago

From bhavesh@avaya.com

This is a bug report for perl from bhavesh@​avaya.com\, generated with the help of perlbug 1.34 running under perl v5.8.0.


This bug report is being submitted as critical because it is preventing the compilation of the Linux 2.4.20 kernel.org kernel

The following test program fails to process the \U..\E expression correctly with perl v5.8.0​: --------------test.pl----------- #!/opt/bin/perl -w

$object = "BAR"; while (\) {   chop;   if (/^\s*MOVE\s+(.*)/i) {   $rest = $1;   if ($rest =~ /^($object).*$/i) {   $obj = "\U$1\E";   print STDERR "Object = $obj\n";   }   } } --------end test.pl------------ With perl v5.8.0​: INPUT​: MOVE BAR OUTPUT​: Object = B0B

With perl v5.005_003​: INPUT​: MOVE BAR OUTPUT​: Object = BAR



Flags​:   category=core   severity=critical


Site configuration information for perl v5.8.0​:

Configured by bhavesh at Fri Dec 6 12​:32​:47 MST 2002.

Summary of my perl5 (revision 5.0 version 8 subversion 0) configuration​:   Platform​:   osname=linux\, osvers=2.4.18-18.8.0\, archname=i686-linux   uname='linux cof110earth.dr.avaya.com 2.4.18-18.8.0 #1 thu nov 14 00​:10​:29 est 2002 i686 i686 i386 gnulinux '   config_args=''   hint=recommended\, useposix=true\, d_sigaction=define   usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef   useperlio=define d_sfio=undef uselargefiles=define usesocks=undef   use64bitint=undef use64bitall=undef uselongdouble=undef   usemymalloc=n\, bincompat5005=undef   Compiler​:   cc='gcc'\, ccflags ='-fno-strict-aliasing -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm'\,   optimize='-O3'\,   cppflags='-fno-strict-aliasing -I/usr/include/gdbm'   ccversion=''\, gccversion='3.2 20020903 (Red Hat Linux 8.0 3.2-7)'\, gccosandvers=''   intsize=4\, longsize=4\, ptrsize=4\, doublesize=8\, byteorder=1234   d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=12   ivtype='long'\, ivsize=4\, nvtype='double'\, nvsize=8\, Off_t='off_t'\, lseeksize=8   alignbytes=4\, prototype=define   Linker and Libraries​:   ld='gcc'\, ldflags =' -L/usr/local/lib'   libpth=/usr/local/lib /lib /usr/lib   libs=-lnsl -lgdbm -ldb -ldl -lm -lc -lcrypt -lutil   perllibs=-lnsl -ldl -lm -lc -lcrypt -lutil   libc=/lib/libc-2.2.93.so\, so=so\, useshrplib=false\, libperl=libperl.a   gnulibc_version='2.2.93'   Dynamic Linking​:   dlsrc=dl_dlopen.xs\, dlext=so\, d_dlsymun=undef\, ccdlflags='-rdynamic'   cccdlflags='-fpic'\, lddlflags='-shared -L/usr/local/lib'

Locally applied patches​:  


@​INC for perl v5.8.0​:   /opt/lib/perl5/5.8.0/i686-linux   /opt/lib/perl5/5.8.0   /opt/lib/perl5/site_perl/5.8.0/i686-linux   /opt/lib/perl5/site_perl/5.8.0   /opt/lib/perl5/site_perl   .


Environment for perl v5.8.0​:   HOME=/home/bhavesh   LANG=en_US.UTF-8   LANGUAGE (unset)   LD_LIBRARY_PATH=/opt/lib​:   LOGDIR (unset)   PATH=/opt/bin​:/usr/local/bin​:/bin​:/usr/bin​:/usr/X11R6/bin​:/home/bhavesh/bin​:/usr/add-on/definity/binlinux   PERL_BADLANG (unset)   SHELL=/bin/bash

p5pRT commented 21 years ago

From @nwc10

On Sat\, Dec 07\, 2002 at 12​:11​:19AM -0000\, Bhavesh Davda wrote​:

This bug report is being submitted as critical because it is preventing the compilation of the Linux 2.4.20 kernel.org kernel

ccversion=''\, gccversion='3\.2 20020903 \(Red Hat Linux 8\.0 3\.2\-7\)'\, gccosandvers=''

--- Environment for perl v5.8.0​: HOME=/home/bhavesh LANG=en_US.UTF-8

It seems to be a UTF8 regexp bug. As a work around unset LANG - I believe the script should work then.

(For p5p - I'm not sure if it's fixed in the development version of perl - I'm compiling a current development version to find out. The cheapest UTF8 locale emulator I have​: $_ .= chr 256; chop; )

Nicholas Clark -- INTERCAL better than perl? http​://www.perl.org/advocacy/spoofathon/

p5pRT commented 21 years ago

From @nwc10

On Sat\, Dec 07\, 2002 at 02​:34​:12PM +0000\, Nicholas Clark wrote​:

It seems to be a UTF8 regexp bug. As a work around unset LANG - I believe the script should work then.

(For p5p - I'm not sure if it's fixed in the development version of perl - I'm compiling a current development version to find out. The cheapest UTF8 locale emulator I have​: $_ .= chr 256; chop; )

$ cat kernelutf8 #!/usr/bin/perl -w

for my $a (0\,1) {   $_ = 'abcdefgh';   if ($ARGV[0]) {   $_ .= chr 256;   chop;   }

  /(.*)/;   print uc ($1)\, "\n"; } __END__ $ ./perl -I lib kernelutf8
ABCDEFGH ABCDEFGH $ ./perl -I lib kernelutf8 1 A0B5GH ABCDEFGH $ cat .patch 18251

It's unsolved :-(

Nicholas Clark -- z-code better than perl? http​://www.perl.org/advocacy/spoofathon/

p5pRT commented 21 years ago

From abe@ztreet.demon.nl

Op een mooie herfstdag (Saturday 07 December 2002 20​:05)\, schreef Nicholas Clark​:

On Sat\, Dec 07\, 2002 at 02​:34​:12PM +0000\, Nicholas Clark wrote​:

It seems to be a UTF8 regexp bug. As a work around unset LANG - I believe the script should work then.

(For p5p - I'm not sure if it's fixed in the development version of perl - I'm compiling a current development version to find out. The cheapest UTF8 locale emulator I have​: $_ .= chr 256; chop; )

$ cat kernelutf8 #!/usr/bin/perl -w

for my $a (0\,1) { $_ = 'abcdefgh'; if ($ARGV[0]) { $_ .= chr 256; chop; }

/(.*)/; print uc ($1)\, "\n"; } __END__ $ ./perl -I lib kernelutf8 ABCDEFGH ABCDEFGH $ ./perl -I lib kernelutf8 1 A0B5GH ABCDEFGH $ cat .patch 18251

Something must have changed since 18076 (or is it configuration?)​:

/usr/local/src/bleadperl/perl$ ./perl -Ilib ../klad/kerlnelutf8 1 ABCDEFGH A0B5GH

/usr/local/src/bleadperl/perl$ ./perl -v This is perl\, v5.9.0 built for i686-linux-thread-multi-64all-ld

Good luck\,

Abe -- "Crashes Perl (or Used To)" is not a really useful classifying criterion\, it's about as useful as "the number of characters in the test is divisible by 73".   -- Jarkko Hietaniemi on p5p @​ 2001-10-30

p5pRT commented 21 years ago

From @jhi

This bug has been fixed (change #18266)\, and the fix will be in Perl 5.8.1\, whenever that happens. I'm therefore marking the problem ticket as resolved.

p5pRT commented 21 years ago

@jhi - Status changed from 'new' to 'resolved'

p5pRT commented 21 years ago

From ams@wiw.org

At 2002-12-07 19​:05​:39 +0000\, nick@​unfortu.net wrote​:

$ ./perl -I lib kernelutf8 1 A0B5GH ABCDEFGH $ cat .patch 18251

It's unsolved :-(

A0B\, B0B\, what's the difference? #18266 fixes it.

It's funny how the same bug\, unnoticed for so long\, suddenly crops up in multiple independent places all at once.

-- ams

p5pRT commented 21 years ago

From @nwc10

On Thu\, Dec 12\, 2002 at 09​:56​:45AM +0530\, Abhijit Menon-Sen wrote​:

At 2002-12-07 19​:05​:39 +0000\, nick@​unfortu.net wrote​:

$ ./perl -I lib kernelutf8 1 A0B5GH ABCDEFGH $ cat .patch 18251

It's unsolved :-(

A0B\, B0B\, what's the difference? #18266 fixes it.

It's funny how the same bug\, unnoticed for so long\, suddenly crops up in multiple independent places all at once.

I suspect that RedHat 8\, which ships with UTF8 locales by default\, may have something to do with multiple independent reports of the same bug. All of a sudden lots of trivial perl scripts are getting to meet the utf8 regexp code\, particularly the swash code. I suspect that perl's regression tests aren't that brutal to the swash code\, because it would mean starting a new interpreter each time.

Now that Storable is in core\, would it be worthwhile (speed or reliability wise) replacing parts (or all) of the swash loader code with calls to Storable to directly load the swash structures? Current utf8_heavy.pl does seem to be doing non-trivial work during loading\, despite tables being built at compile time. I admit that this paragraph roughly summarises my entire knowledge of what it's doing.

Nicholas Clark

p5pRT commented 21 years ago

@rspier - Status changed from 'open' to 'resolved'