Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.99k stars 559 forks source link

[Bug in maint-perl-latest] utf8+locale substitution cause the Perl interpreter to hang. #9554

Closed p5pRT closed 13 years ago

p5pRT commented 16 years ago

Migrated from rt.perl.org#60326 (status was 'resolved')

Searchable as RT60326$

p5pRT commented 16 years ago

From @shlomif

Hi all!

A correspondant who have choosen to remain anonymous has sent me this as he's been bitten by this bug.

When executing the following program (with a UTF-8 character in it) in maintperl\, the program hangs at the s/// command​:

\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\< #!/usr/bin/perl

use strict; use warnings;

use utf8; use locale;

my $text = " ”\001 ";

my $EOS = "\001"; $text =~ s/(\s\w\.\s+)$EOS/$1/sg;

binmode STDOUT\, "​:utf8"; print "$text\n";

I can confirm on perl-5.8.8 and on perl-5.8.x-latest (maint-perl-5.8.x)\, and it doesn't exist in my Mandriva Cooker's perl-5.10.0 ( perl-5.10.0-21mdv2009.0 ).

Regards\,

  Shlomi Fish


Flags​:   category=   severity=


Site configuration information for perl v5.8.8​:

Configured by shlomi at Mon Nov 3 21​:40​:33 IST 2008.

Summary of my perl5 (revision 5 version 8 subversion 8 patch 34701) configuration​:   Platform​:   osname=linux\, osvers=2.6.27.4\, archname=i686-linux   uname='linux telaviv1.shlomifish.org 2.6.27.4 #1 smp preempt mon oct 27 15​:10​:25 ist 2008 i686 intel(r) pentium(r) 4 cpu 2.40ghz gnulinux '  
config_args='-de -Dprefix=/home/shlomi/apps/perl/perl-5.8.x-latest -Doptimize=-g'   hint=recommended\, useposix=true\, d_sigaction=define   usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef   useperlio=define d_sfio=undef uselargefiles=define usesocks=undef   use64bitint=undef use64bitall=undef uselongdouble=undef   usemymalloc=n\, bincompat5005=undef   Compiler​:   cc='cc'\, ccflags ='-DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm'\,   optimize='-g'\,  
cppflags='-DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'   ccversion=''\, gccversion='4.3.2'\, gccosandvers=''   intsize=4\, longsize=4\, ptrsize=4\, doublesize=8\, byteorder=1234   d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=12   ivtype='long'\, ivsize=4\, nvtype='double'\, nvsize=8\, Off_t='off_t'\, lseeksize=8   alignbytes=4\, prototype=define   Linker and Libraries​:   ld='cc'\, ldflags =' -L/usr/local/lib'   libpth=/usr/local/lib /lib /usr/lib /usr/lib64   libs=-lnsl -lndbm -lgdbm -ldl -lm -lcrypt -lutil -lc   perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc   libc=/lib/libc-2.8.so\, so=so\, useshrplib=false\, libperl=libperl.a   gnulibc_version='2.8'   Dynamic Linking​:   dlsrc=dl_dlopen.xs\, dlext=so\, d_dlsymun=undef\, ccdlflags='-Wl\,-E'   cccdlflags='-fPIC'\, lddlflags='-shared -g -L/usr/local/lib'

Locally applied patches​:   MAINT34689


@​INC for perl v5.8.8​:   /home/shlomi/apps/perl/modules/lib/perl5/site_perl/5.10.0   /home/shlomi/apps/perl/modules/lib/perl5/site_perl/5.8.8   /home/shlomi/apps/perl/modules/lib/site_perl/5.10.0   /home/shlomi/apps/perl/modules/lib/site_perl/5.8.8/i686-linux   /home/shlomi/apps/perl/modules/lib/site_perl/5.8.8   /home/shlomi/apps/perl/modules/lib/perl5/5.10.0   /home/shlomi/apps/perl/modules/lib/perl5/5.8.8   /home/shlomi/apps/perl/perl-5.8.x-latest/lib/5.8.8/i686-linux   /home/shlomi/apps/perl/perl-5.8.x-latest/lib/5.8.8   /home/shlomi/apps/perl/perl-5.8.x-latest/lib/site_perl/5.8.8/i686-linux   /home/shlomi/apps/perl/perl-5.8.x-latest/lib/site_perl/5.8.8   .


Environment for perl v5.8.8​:   HOME=/home/shlomi   LANG=en_GB.UTF-8   LANGUAGE=en_GB​:en   LC_ADDRESS=en_US.UTF-8   LC_COLLATE=en_US.UTF-8   LC_CTYPE=en_US.UTF-8   LC_IDENTIFICATION=en_GB.UTF-8   LC_MEASUREMENT=en_GB.UTF-8   LC_MESSAGES=en_US.UTF-8   LC_MONETARY=en_US.UTF-8   LC_NAME=en_GB.UTF-8   LC_NUMERIC=en_GB.UTF-8   LC_PAPER=en_US.UTF-8   LC_SOURCED=1   LC_TELEPHONE=en_US.UTF-8   LC_TIME=en_GB.UTF-8   LD_LIBRARY_PATH=/home/shlomi/Download/unpack/gui/X/nouveau/mesa/mesa/lib   LOGDIR (unset)  
PATH=/opt/kde3/bin​:/usr/lib/jvm/java-1.6.0-sun-1.6.0.06//bin​:/home/shlomi/Download/unpack/graphics/fop/fop-0.93​:/home/shlomi/apps/perl/modules/local/bin​:/home/shlomi/apps/latemp/bin​:/home/shlomi/apps/file/gringotts/bin​:/home/shlomi/apps/gimageview/bin​:/home/shlomi/apps/test/quadpres/bin​:/home/shlomi/apps/docbook-builder/local/bin​:/home/shlomi/bin​:/usr/local/bin​:/bin​:/usr/bin​:/usr/games​:/usr/lib/qt4/bin​:/usr/bin​:/opt/kde3/bin​:/usr/lib/ssh  
PERL5LIB=/home/shlomi/apps/perl/modules/lib/perl5/site_perl/5.10.0​:/home/shlomi/apps/perl/modules/lib/perl5/site_perl/5.8.8​:/home/shlomi/apps/perl/modules/lib/site_perl/5.10.0​:/home/shlomi/apps/perl/modules/lib/site_perl/5.8.8​:/home/shlomi/apps/perl/modules/lib/perl5/5.10.0​:/home/shlomi/apps/perl/modules/lib/perl5/5.8.8   PERL_BADLANG (unset)   SHELL=/bin/bash


Shlomi Fish http​://www.shlomifish.org/ Interview with Ben Collins-Sussman - http​://xrl.us/bjn8s

Shlomi\, so what are you working on? Working on a new wiki about unit testing fortunes in freecell? -- Ran Eilam

p5pRT commented 16 years ago

From @andk

On Mon\, 03 Nov 2008 22​:00​:07 +0200\, Shlomi Fish \shlomif@&#8203;iglu\.org\.il said​:

  > When executing the following program (with a UTF-8 character in it) in   > maintperl\, the program hangs at the s/// command​:

  > \<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<   > #!/usr/bin/perl

  > use strict;   > use warnings;

  > use utf8;   > use locale;

  > my $text = " ”\001 ";

  > my $EOS = "\001";   > $text =~ s/(\s\w\.\s+)$EOS/$1/sg;

  > binmode STDOUT\, "​:utf8";   > print "$text\n";

Binary search reveals\, this was fixed by patch 29360.

-- andreas

p5pRT commented 16 years ago

From @demerphq

2008/11/4 Andreas J. Koenig \andreas\.koenig\.7os6VVqR@&#8203;franz\.ak\.mind\.de​:

On Mon\, 03 Nov 2008 22​:00​:07 +0200\, Shlomi Fish \shlomif@&#8203;iglu\.org\.il said​:

When executing the following program (with a UTF-8 character in it) in maintperl\, the program hangs at the s/// command​:

\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\< #!/usr/bin/perl

use strict; use warnings;

use utf8; use locale;

my $text = " "\001 ";

my $EOS = "\001"; $text =~ s/(\s\w\.\s+)$EOS/$1/sg;

binmode STDOUT\, "​:utf8"; print "$text\n";

Binary search reveals\, this was fixed by patch 29360.

commit 302830f77389162abcf9d3689d1a55bbab62a739 Author​: Yves Orton \demerphq@&#8203;gmail\.com Date​: Thu Nov 23 13​:36​:24 2006 +0100

  Cleanup regexp flags and structure   Message-ID​: \9b18b3110611230336p3ce3b16du47cd5398dea8d873@&#8203;mail\.gmail\.com

  p4raw-id​: //depot/perl@​29360

Which is strange\, I wouldnt have expected THAT patch to fix anything at all. :-)

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 16 years ago

From rick@bort.ca

On Nov 04 2008\, demerphq wrote​:

2008/11/4 Andreas J. Koenig \andreas\.koenig\.7os6VVqR@&#8203;franz\.ak\.mind\.de​:

On Mon\, 03 Nov 2008 22​:00​:07 +0200\, Shlomi Fish \shlomif@&#8203;iglu\.org\.il said​:

When executing the following program (with a UTF-8 character in it) in maintperl\, the program hangs at the s/// command​:

\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\< #!/usr/bin/perl

use strict; use warnings;

use utf8; use locale;

my $text = " "\001 ";

my $EOS = "\001"; $text =~ s/(\s\w\.\s+)$EOS/$1/sg;

binmode STDOUT\, "​:utf8"; print "$text\n";

Binary search reveals\, this was fixed by patch 29360.

Which is strange\, I wouldnt have expected THAT patch to fix anything at all. :-)

Could it have something to do with this question I posed 4 years ago?

I also noticed this in regexp.h while trying to track this bug down​:

#define ROPT_CANY_SEEN 0x00800 #define ROPT_SANY_SEEN ROPT_CANY_SEEN /* src bckwrd cmpt */

/* 0xf800 of reganch is used by PMf_COMPILETIME */

It looks like PMf_COMPILETIME overlaps with ROPT_CANY_SEEN\, specifically the PMf_LOCALE portion. I don't know what ROPT_CANY_SEEN is; is this ok? It doesn't look it.

It looks like when you renumbered the flags that PMf_LOCALE (now RXf_PMf_LOCALE) and ROPT_CANY_SEEN (now RXf_CANY_SEEN) are no longer the same. I still don't know what CANY_SEEN is but I'd guess that it comes into play in the above regexp and was formerly screwed up by the "use locale".

-- Rick Delaney rick@​bort.ca

p5pRT commented 16 years ago

From @demerphq

2008/11/4 Rick Delaney \rick@&#8203;bort\.ca​:

On Nov 04 2008\, demerphq wrote​:

2008/11/4 Andreas J. Koenig \andreas\.koenig\.7os6VVqR@&#8203;franz\.ak\.mind\.de​:

On Mon\, 03 Nov 2008 22​:00​:07 +0200\, Shlomi Fish \shlomif@&#8203;iglu\.org\.il said​:

When executing the following program (with a UTF-8 character in it) in maintperl\, the program hangs at the s/// command​:

\<\<\<\<\<\<\<\<\<\<\<\<\<\<\<\< #!/usr/bin/perl

use strict; use warnings;

use utf8; use locale;

my $text = " "\001 ";

my $EOS = "\001"; $text =~ s/(\s\w\.\s+)$EOS/$1/sg;

binmode STDOUT\, "​:utf8"; print "$text\n";

Binary search reveals\, this was fixed by patch 29360.

Which is strange\, I wouldnt have expected THAT patch to fix anything at all. :-)

Could it have something to do with this question I posed 4 years ago?

I also noticed this in regexp.h while trying to track this bug down​:

#define ROPT_CANY_SEEN 0x00800 #define ROPT_SANY_SEEN ROPT_CANY_SEEN /* src bckwrd cmpt */

/* 0xf800 of reganch is used by PMf_COMPILETIME */

It looks like PMf_COMPILETIME overlaps with ROPT_CANY_SEEN\, specifically the PMf_LOCALE portion. I don't know what ROPT_CANY_SEEN is; is this ok? It doesn't look it.

It looks like when you renumbered the flags that PMf_LOCALE (now RXf_PMf_LOCALE) and ROPT_CANY_SEEN (now RXf_CANY_SEEN) are no longer the same. I still don't know what CANY_SEEN is but I'd guess that it comes into play in the above regexp and was formerly screwed up by the "use locale".

Yep\, that makes sense. Nice detective work/memory.

:-)

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 13 years ago

From @cpansprout

Fixed by bbe252da68db.

p5pRT commented 13 years ago

@cpansprout - Status changed from 'new' to 'resolved'