Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 523 forks source link

Regular Expression Matching Bug #1189

Closed p5pRT closed 20 years ago

p5pRT commented 24 years ago

Migrated from rt.perl.org#2158 (status was 'resolved')

Searchable as RT2158$

p5pRT commented 24 years ago

From ishisone@sra.co.jp

This is a bug report for perl from ishisone@​sra.co.jp\, generated with the help of perlbug 1.27 running under perl v5.5.650.

  "\n\n" =~ /\n $ \n/x and print "OK1\n";   "\n\n" =~ /\n* $ \n/x and print "OK2\n";   "\n\n" =~ /\n+ $ \n/x and print "OK3\n";

All of the above 3 matches should be successful\, but with the current perl only the first one succeeds.

The regexec.c​:regmatch() routine has a small optimization (avoiding unnecessary backtracking) for patterns such as 'a+$'\, but the code forgot the fact that '$' can match before and after newline.

Here's the patch to v5.5.650. This fixes the above bug\, and also makes \z (EOS) use this optimization.

*** regexec.c.org Mon Feb 7 04​:33​:00 2000 --- regexec.c Tue Feb 15 18​:25​:00 2000 *************** *** 3039\,3046 ****   n = regrepeat(scan\, n);   locinput = PL_reginput;   if (ln \< n && PL_regkind[(U8)OP(next)] == EOL && ! (!PL_multiline || OP(next) == SEOL))   ln = n; /* why back off? */   REGCP_SET;   if (paren) {   while (n >= ln) { --- 3039\,3052 ----   n = regrepeat(scan\, n);   locinput = PL_reginput;   if (ln \< n && PL_regkind[(U8)OP(next)] == EOL && ! (!PL_multiline || OP(next) == SEOL || OP(next) == EOS)) {   ln = n; /* why back off? */ + /* ...because $ and \Z can match before *and* after + newline at the end. Consider "\n\n" =~ /\n+\Z\n/. + We should back off by one in this case. */ + if (UCHARAT(PL_reginput - 1) == '\n' && OP(next) != EOS) + ln--; + }   REGCP_SET;   if (paren) {   while (n >= ln) {


Site configuration information for perl v5.5.650​:

Configured by ishisone at Tue Feb 15 09​:29​:56 JST 2000.

Summary of my perl5 (revision 5.0 version 5 subversion 650) configuration​:   Platform​:   osname=freebsd\, osvers=2.2.8-release\, archname=i386-freebsd   uname='freebsd srapc459.sra.co.jp 2.2.8-release freebsd 2.2.8-release #23​: fri oct 22 18​:15​:23 jst 1999 ishisone@​srapc459.sra.co.jp​:usrsrcsyscompilesrapc459.v6 i386 '   config_args=''   hint=recommended\, useposix=true\, d_sigaction=define   usethreads=undef use5005threads=undef useithreads=undef   usesocks=undef useperlio=undef d_sfio=undef   use64bits=undef uselargefiles=define usemultiplicity=undef   Compiler​:   cc='cc'\, optimize='-O'\, gccversion=2.7.2.1   cppflags='-I/usr/local/include'   ccflags ='-I/usr/local/include'   stdchar='char'\, d_stdstdio=undef\, usevfork=true   intsize=4\, longsize=4\, ptrsize=4\, doublesize=8   d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=12   alignbytes=4\, usemymalloc=n\, prototype=define   Linker and Libraries​:   ld='ld'\, ldflags =' -L/usr/local/lib'   libpth=/usr/local/lib /usr/lib   libs=-lm -lc -lcrypt   libc=/usr/lib/libc.so.3.1\, so=so\, useshrplib=false\, libperl=libperl.a   Dynamic Linking​:   dlsrc=dl_dlopen.xs\, dlext=so\, d_dlsymun=undef\, ccdlflags=' '   cccdlflags='-DPIC -fpic'\, lddlflags='-Bshareable -L/usr/local/lib'

Locally applied patches​:  


@​INC for perl v5.5.650​:   /amd/a/srapc451/mnt3/home/mgr/ishisone/lib/perl5   /usr/local/lib/perl5/5.5.650/i386-freebsd   /usr/local/lib/perl5/5.5.650   /usr/local/lib/perl5/site_perl/5.5.650/i386-freebsd   /usr/local/lib/perl5/site_perl/5.5.650   /usr/local/lib/perl5/site_perl/5.005/i386-freebsd   /usr/local/lib/perl5/site_perl/5.005   /usr/local/lib/perl5/site_perl   .


Environment for perl v5.5.650​:   HOME=/amd/a/srapc451/mnt3/home/mgr/ishisone   LANG=ja_JP.EUC   LANGUAGE (unset)   LC_COLLATE=C   LC_TIME=C   LD_LIBRARY_PATH (unset)   LOGDIR (unset)   PATH=/amd/a/srapc451/mnt3/home/mgr/ishisone/bin​:/amd/a/srapc451/mnt3/home/mgr/ishisone/bin/i386-freebsd2​:/usr/X11R6/bin​:/usr/local/bin​:/usr/local/sbin​:/usr/sra/bin​:/usr/local/tdoc/bin​:/usr/local/emacs/bin​:/usr/new/mh​:/usr/local/bin/mh​:/usr/local/v6/bin​:/usr/local/v6/sbin​:/usr/ucb​:/usr/bin​:/usr/new​:/bin​:/etc​:/usr/etc​:/usr/sbin​:/sbin​:/amd/a/srapc451/mnt3/home/mgr/ishisone/bin/lastresort​:   PERL5LIB=/amd/a/srapc451/mnt3/home/mgr/ishisone/lib/perl5   PERL_BADLANG=0   SHELL=/usr/local/bin/bash

p5pRT commented 24 years ago

From [Unknown Contact. See original ticket]

Makoto Ishisone writes​:

"\\n\\n" =~ /\\n  $ \\n/x and print "OK1\\n";
"\\n\\n" =~ /\\n\* $ \\n/x and print "OK2\\n";
"\\n\\n" =~ /\\n\+ $ \\n/x and print "OK3\\n";

All of the above 3 matches should be successful\, but with the current perl only the first one succeeds.

The regexec.c​:regmatch() routine has a small optimization (avoiding unnecessary backtracking) for patterns such as 'a+$'\, but the code forgot the fact that '$' can match before and after newline.

Here's the patch to v5.5.650. This fixes the above bug\, and also makes \z (EOS) use this optimization.

*** regexec.c.org Mon Feb 7 04​:33​:00 2000 --- regexec.c Tue Feb 15 18​:25​:00 2000 *************** --- 3039\,3052 ---- n = regrepeat(scan\, n); locinput = PL_reginput; if (ln \< n && PL_regkind[(U8)OP(next)] == EOL && - (!PL_multiline || OP(next) == SEOL)) + (!PL_multiline || OP(next) == SEOL || OP(next) == EOS)) { ln = n; /* why back off? */ + /* ...because $ and \Z can match before *and* after + newline at the end. Consider "\n\n" =~ /\n+\Z\n/. + We should back off by one in this case. */ + if (UCHARAT(PL_reginput - 1) == '\n' && OP(next) != EOS) + ln--; + } REGCP_SET;

This looks OK\, but aren't there other similar places?

Ilya