Closed p5pRT closed 20 years ago
This is a bug report for perl from ishisone@sra.co.jp\, generated with the help of perlbug 1.27 running under perl v5.5.650.
"\n\n" =~ /\n $ \n/x and print "OK1\n"; "\n\n" =~ /\n* $ \n/x and print "OK2\n"; "\n\n" =~ /\n+ $ \n/x and print "OK3\n";
All of the above 3 matches should be successful\, but with the current perl only the first one succeeds.
The regexec.c:regmatch() routine has a small optimization (avoiding unnecessary backtracking) for patterns such as 'a+$'\, but the code forgot the fact that '$' can match before and after newline.
Here's the patch to v5.5.650. This fixes the above bug\, and also makes \z (EOS) use this optimization.
*** regexec.c.org Mon Feb 7 04:33:00 2000 --- regexec.c Tue Feb 15 18:25:00 2000 *************** *** 3039\,3046 **** n = regrepeat(scan\, n); locinput = PL_reginput; if (ln \< n && PL_regkind[(U8)OP(next)] == EOL && ! (!PL_multiline || OP(next) == SEOL)) ln = n; /* why back off? */ REGCP_SET; if (paren) { while (n >= ln) { --- 3039\,3052 ---- n = regrepeat(scan\, n); locinput = PL_reginput; if (ln \< n && PL_regkind[(U8)OP(next)] == EOL && ! (!PL_multiline || OP(next) == SEOL || OP(next) == EOS)) { ln = n; /* why back off? */ + /* ...because $ and \Z can match before *and* after + newline at the end. Consider "\n\n" =~ /\n+\Z\n/. + We should back off by one in this case. */ + if (UCHARAT(PL_reginput - 1) == '\n' && OP(next) != EOS) + ln--; + } REGCP_SET; if (paren) { while (n >= ln) {
Site configuration information for perl v5.5.650:
Configured by ishisone at Tue Feb 15 09:29:56 JST 2000.
Summary of my perl5 (revision 5.0 version 5 subversion 650) configuration: Platform: osname=freebsd\, osvers=2.2.8-release\, archname=i386-freebsd uname='freebsd srapc459.sra.co.jp 2.2.8-release freebsd 2.2.8-release #23: fri oct 22 18:15:23 jst 1999 ishisone@srapc459.sra.co.jp:usrsrcsyscompilesrapc459.v6 i386 ' config_args='' hint=recommended\, useposix=true\, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usesocks=undef useperlio=undef d_sfio=undef use64bits=undef uselargefiles=define usemultiplicity=undef Compiler: cc='cc'\, optimize='-O'\, gccversion=2.7.2.1 cppflags='-I/usr/local/include' ccflags ='-I/usr/local/include' stdchar='char'\, d_stdstdio=undef\, usevfork=true intsize=4\, longsize=4\, ptrsize=4\, doublesize=8 d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=12 alignbytes=4\, usemymalloc=n\, prototype=define Linker and Libraries: ld='ld'\, ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /usr/lib libs=-lm -lc -lcrypt libc=/usr/lib/libc.so.3.1\, so=so\, useshrplib=false\, libperl=libperl.a Dynamic Linking: dlsrc=dl_dlopen.xs\, dlext=so\, d_dlsymun=undef\, ccdlflags=' ' cccdlflags='-DPIC -fpic'\, lddlflags='-Bshareable -L/usr/local/lib'
Locally applied patches:
@INC for perl v5.5.650: /amd/a/srapc451/mnt3/home/mgr/ishisone/lib/perl5 /usr/local/lib/perl5/5.5.650/i386-freebsd /usr/local/lib/perl5/5.5.650 /usr/local/lib/perl5/site_perl/5.5.650/i386-freebsd /usr/local/lib/perl5/site_perl/5.5.650 /usr/local/lib/perl5/site_perl/5.005/i386-freebsd /usr/local/lib/perl5/site_perl/5.005 /usr/local/lib/perl5/site_perl .
Environment for perl v5.5.650: HOME=/amd/a/srapc451/mnt3/home/mgr/ishisone LANG=ja_JP.EUC LANGUAGE (unset) LC_COLLATE=C LC_TIME=C LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/amd/a/srapc451/mnt3/home/mgr/ishisone/bin:/amd/a/srapc451/mnt3/home/mgr/ishisone/bin/i386-freebsd2:/usr/X11R6/bin:/usr/local/bin:/usr/local/sbin:/usr/sra/bin:/usr/local/tdoc/bin:/usr/local/emacs/bin:/usr/new/mh:/usr/local/bin/mh:/usr/local/v6/bin:/usr/local/v6/sbin:/usr/ucb:/usr/bin:/usr/new:/bin:/etc:/usr/etc:/usr/sbin:/sbin:/amd/a/srapc451/mnt3/home/mgr/ishisone/bin/lastresort: PERL5LIB=/amd/a/srapc451/mnt3/home/mgr/ishisone/lib/perl5 PERL_BADLANG=0 SHELL=/usr/local/bin/bash
Makoto Ishisone writes:
"\\n\\n" =~ /\\n $ \\n/x and print "OK1\\n"; "\\n\\n" =~ /\\n\* $ \\n/x and print "OK2\\n"; "\\n\\n" =~ /\\n\+ $ \\n/x and print "OK3\\n";
All of the above 3 matches should be successful\, but with the current perl only the first one succeeds.
The regexec.c:regmatch() routine has a small optimization (avoiding unnecessary backtracking) for patterns such as 'a+$'\, but the code forgot the fact that '$' can match before and after newline.
Here's the patch to v5.5.650. This fixes the above bug\, and also makes \z (EOS) use this optimization.
*** regexec.c.org Mon Feb 7 04:33:00 2000 --- regexec.c Tue Feb 15 18:25:00 2000 *************** --- 3039\,3052 ---- n = regrepeat(scan\, n); locinput = PL_reginput; if (ln \< n && PL_regkind[(U8)OP(next)] == EOL && - (!PL_multiline || OP(next) == SEOL)) + (!PL_multiline || OP(next) == SEOL || OP(next) == EOS)) { ln = n; /* why back off? */ + /* ...because $ and \Z can match before *and* after + newline at the end. Consider "\n\n" =~ /\n+\Z\n/. + We should back off by one in this case. */ + if (UCHARAT(PL_reginput - 1) == '\n' && OP(next) != EOS) + ln--; + } REGCP_SET;
This looks OK\, but aren't there other similar places?
Ilya
Migrated from rt.perl.org#2158 (status was 'resolved')
Searchable as RT2158$