Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.98k stars 558 forks source link

Heisenbug: use locale; ... (substr(...) cmp substr(...)) #3982

Closed p5pRT closed 20 years ago

p5pRT commented 23 years ago

Migrated from rt.perl.org#6985 (status was 'resolved')

Searchable as RT6985$

p5pRT commented 23 years ago

From sburke@spinn.net

The bug seems unaffected by the presence or absensce of "use utf8".

I've observed this bug under 5.6.0 under MSWin (ME and 98) /and/ Linux (three different machines incidentally)\, and under 5.00305 under AIX; those are all the systems I've tested this under.

Below are perl -V outputs on the three platforms\, and then a program reproducing the bug in many iterations.

I note that the bug seems to appear /only/ when there is an OS locale active /and/ use locale is on /and/ we try to evaluate substr(...) cmp substr(...) in scope of the "use locale". Very odd indeed! Good luck finding a fix for this.

Summary of my perl5 (5.0 patchlevel 5 subversion 3) configuration​:   Platform​:   osname=aix\, osvers=4.3.1.0\, archname=aix   uname='aix pegasus 3 4 000165185700 '   hint=recommended\, useposix=true\, d_sigaction=define   usethreads=undef useperlio=undef d_sfio=undef   Compiler​:   cc='cc'\, optimize='-O'\, gccversion=   cppflags='-D_ALL_SOURCE -D_ANSI_C_SOURCE -D_POSIX_SOURCE -qmaxmem=8192 -I/usr/local/include -I/usr/local/gnu/include'   ccflags ='-D_ALL_SOURCE -D_ANSI_C_SOURCE -D_POSIX_SOURCE -qmaxmem=8192 -I/usr/local/include -I/usr/local/gnu/include'   stdchar='unsigned char'\, d_stdstdio=define\, usevfork=false   intsize=4\, longsize=4\, ptrsize=4\, doublesize=8   d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=8   alignbytes=8\, usemymalloc=n\, prototype=define   Linker and Libraries​:   ld='ld'\, ldflags ='-L/usr/local/lib -L/usr/local/gnu/lib -L/usr/ccs/lib -L/lib'   libpth=/usr/local/lib /lib /usr/lib /usr/ccs/lib /usr/local/gnu/lib   libs=-lnsl -lgdbm -ldbm -ldl -lld -lm -lc -lcrypt -lbsd -lPW   libc=\, so=a\, useshrplib=false\, libperl=libperl.a   Dynamic Linking​:   dlsrc=dl_aix.xs\, dlext=so\, d_dlsymun=undef\, ccdlflags='-bE​:perl.exp'   cccdlflags=' '\, lddlflags='-bhalt​:4 -bM​:SRE -bI​:$(PERL_INC)/perl.exp -bE​:$(BASEEXT).exp -b noentry -lc -L/usr/local/lib -L/usr/local/gnu/lib -L/usr/ccs/lib -L/lib'

Characteristics of this binary (from libperl)​:   Built under aix   Compiled at Aug 12 1999 09​:10​:29   %ENV​:   PERLLIB="/nfs/user/l/lachler/.bin/perl/aix​:/nfs/user/l/lachler/.bin/perl" [and\, not shown\, LANG="en_US"]

  @​INC​:   /nfs/user/l/lachler/.bin/perl/aix   /nfs/user/l/lachler/.bin/perl   /usr/local/gnu/lib/perl5/5.00503/aix   /usr/local/gnu/lib/perl5/5.00503   /usr/local/gnu/lib/perl5/site_perl/5.005/aix   /usr/local/gnu/lib/perl5/site_perl/5.005   .

Summary of my perl5 (revision 5.0 version 6 subversion 0) configuration​:   Platform​:   osname=linux\, osvers=2.2.16-21\, archname=i686-linux   uname='linux spinntwo 2.2.16-21 #1 wed aug 9 10​:38​:45 edt 2000 i686 unknown '   config_args=''   hint=recommended\, useposix=true\, d_sigaction=define   usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef   useperlio=undef d_sfio=undef uselargefiles=define   use64bitint=undef use64bitall=undef uselongdouble=undef usesocks=undef   Compiler​:   cc='cc'\, optimize='-O2'\, gccversion=egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)   cppflags='-fno-strict-aliasing -I/usr/local/include'   ccflags ='-fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'   stdchar='char'\, d_stdstdio=define\, usevfork=false   intsize=4\, longsize=4\, ptrsize=4\, doublesize=8   d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=12   ivtype='long'\, ivsize=4\, nvtype='double'\, nvsize=8\, Off_t='off_t'\, lseeksize=8   alignbytes=4\, usemymalloc=n\, prototype=define   Linker and Libraries​:   ld='cc'\, ldflags =' -L/usr/local/lib'   libpth=/usr/local/lib /lib /usr/lib   libs=-lnsl -lndbm -ldb -ldl -lm -lc -lposix -lcrypt   libc=/lib/libc-2.1.3.so\, so=so\, useshrplib=false\, libperl=libperl.a   Dynamic Linking​:   dlsrc=dl_dlopen.xs\, dlext=so\, d_dlsymun=undef\, ccdlflags='-rdynamic'   cccdlflags='-fpic'\, lddlflags='-shared -L/usr/local/lib'

Characteristics of this binary (from libperl)​:   Compile-time options​: USE_LARGE_FILES   Built under linux   Compiled at Oct 27 2000 18​:12​:05   %ENV​:   PERLLIB="/home/users/s/sburke/.bin/perl/" [and\, not shown\, LANG="en_US"]

  @​INC​:   /home/users/s/sburke/.bin/perl/   /usr/lib/perl5/5.6.0/i686-linux   /usr/lib/perl5/5.6.0   /usr/lib/perl5/site_perl/5.6.0/i686-linux   /usr/lib/perl5/site_perl/5.6.0   /usr/lib/perl5/site_perl/5.005   /usr/lib/perl5/site_perl   .

Summary of my perl5 (revision 5 version 6 subversion 0) configuration​:   Platform​:   osname=MSWin32\, osvers=4.0\, archname=MSWin32-x86-multi-thread   uname=''   config_args='undef'   hint=recommended\, useposix=true\, d_sigaction=undef   usethreads=undef use5005threads=undef useithreads=define usemultiplicity=define   useperlio=undef d_sfio=undef uselargefiles=undef   use64bitint=undef use64bitall=undef uselongdouble=undef usesocks=undef   Compiler​:   cc='cl'\, optimize='-O1 -MD -DNDEBUG'\, gccversion=   cppflags='-DWIN32'   ccflags ='-O1 -MD -DNDEBUG -DWIN32 -D_CONSOLE -DNO_STRICT -DHAVE_DES_FCRYPT -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -DPERL_MSVCRT_READFIX'   stdchar='char'\, d_stdstdio=define\, usevfork=false   intsize=4\, longsize=4\, ptrsize=4\, doublesize=8   d_longlong=undef\, longlongsize=8\, d_longdbl=define\, longdblsize=10   ivtype='long'\, ivsize=4\, nvtype='double'\, nvsize=8\, Off_t='off_t'\, lseeksize=4   alignbytes=8\, usemymalloc=n\, prototype=define   Linker and Libraries​:   ld='link'\, ldflags ='-nologo -nodefaultlib -release -libpath​:"C​:\Perl\lib\CORE" -machine​:x86'   libpth="C​:\Perl\lib\CORE"   libs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netapi32.lib uuid.lib wsock32.lib mpr.lib winmm.lib version.lib odbc32.lib odbccp32.lib msvcrt.lib   libc=msvcrt.lib\, so=dll\, useshrplib=yes\, libperl=perl56.lib   Dynamic Linking​:   dlsrc=dl_win32.xs\, dlext=dll\, d_dlsymun=undef\, ccdlflags=' '   cccdlflags=' '\, lddlflags='-dll -nologo -nodefaultlib -release -libpath​:"C​:\Perl\lib\CORE" -machine​:x86'

Characteristics of this binary (from libperl)​:   Compile-time options​: MULTIPLICITY USE_ITHREADS PERL_IMPLICIT_CONTEXT PERL_IMPLICIT_SYS   Locally applied patches​:   ActivePerl Build 623   Built under MSWin32   Compiled at Dec 15 2000 16​:27​:07   %ENV​:   PERLDOC_PAGER="towrite"   @​INC​:   C​:/Perl/lib   C​:/Perl/site/lib   . [and a Control Panel "Regional Settings" value of "English (United States)" -- altho changing to different locales (Russian\, French\, Italian...) doesn't seem to make the bug go away.]

And now the program exhibiting the bug​:

# Under 5.6.0 and 5.00503\, this screams like a baby under LANG=en_US\, # /or/ under MSWin! # Commenting out 'use locale' makes the bug go away\, as does (under # UNIX) unsetenv LANG. # Time-stamp​: "2001-05-10 01​:40​:16 MDT"

print "Perl version $]\n"; use strict; use locale; my $r = .75; my $bad = 0; my $all = 0; print "R​: $r\n"; for(1 .. 1000) {   my $x = "foobar";   my $y = "fooa";   ($x\,$y) = ($y\,$x) if rand > $r;

  my $cmp1 = substr($x\,0\,4) cmp substr($y\,0\,4);   my $i = substr($x\,0\,4);   my $j = substr($y\,0\,4);   my $cmp2 = $i cmp $j;

  # The following two lines are never run​:   print "****SCREAM1\n" unless $i eq substr($x\,0\,4);   print "****SCREAM2\n" unless $j eq substr($y\,0\,4);  
  if($cmp1 != $cmp2) {   ++$bad;   print "SCREAM! ";   } else {   print "okay ";   }   ++$all;  
  printf "%2s : %2s \<%s>\<%s> from \<%s> \<%s>\n"\,   $cmp1\,$cmp2\, $i\,$j\, $x\,$y;  
 
} printf "\n\nPct bad​: %g%% (%g of %g); Cf. r=%g\n"\, 100 * $bad / $all\, $bad\, $all\, $r; __END__

Output​:

Perl version 5.006 R​: 0.75 okay -1 : -1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ okay -1 : -1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ okay -1 : -1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ okay -1 : -1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ okay -1 : -1 \\ from \ \ okay -1 : -1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ okay -1 : -1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ okay -1 : -1 \\ from \ \ okay -1 : -1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ [...lots more lines...] okay -1 : -1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ okay -1 : -1 \\ from \ \ okay -1 : -1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ okay -1 : -1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ okay -1 : -1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \ SCREAM! -1 : 1 \\ from \ \

Pct bad​: 72.4% (724 of 1000); Cf. r=0.75

You'll note that in the dump above\, you get a scream IFF "foob" is first. Usually when I run the exact same program\, however\, I get a scream IFF "foob" ISN'T first.

As I run this more\, it seems to be the case that the pct-bad seems to vary directly with R /or/ 1-R. With an R of .75\, 75% of the time\, pct-bad is 25%\, and 25% of the time\, it's 75%. This relationship (N% of runs report percent-bads of approx 100-N%\, and vice versa) seems to hold between .5 and 1\, but seems to flip below .5 (i.e.\, N% of runs report percent-bads of approx N%\, and 100-N% of instances have 100-N% percent-bads.)

p5pRT commented 23 years ago

From @jhi

  Fix for 20010514.037; substr() didn't invalidate the locale   collation magic.

Affected files ...

... //depot/perl/pp.c#278 edit

Differences ...

==== //depot/perl/pp.c#278 (text) ==== Index​: perl/pp.c

Inline Patch ```diff --- perl/pp.c.~1~ Tue May 15 05:42:26 2001 +++ perl/pp.c Tue May 15 05:42:26 2001 @@ -2808,6 +2808,9 @@ sv_pos_u2b(sv, &pos, &rem); tmps += pos; sv_setpvn(TARG, tmps, rem); +#ifdef USE_LOCALE_COLLATE + sv_unmagic(TARG, 'o'); +#endif if (utf8_curlen) SvUTF8_on(TARG); if (repl) { End of Patch. ```