Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.93k stars 549 forks source link

Regex inconsistency with multiple PRUNE/SKIP/COMMIT #12853

Open p5pRT opened 11 years ago

p5pRT commented 11 years ago

Migrated from rt.perl.org#117179 (status was 'open')

Searchable as RT117179$

p5pRT commented 11 years ago

From ph10@hermes.cam.ac.uk

Created by ph10@hermes.cam.ac.uk

I have been testing combinations of (*SKIP)\, (*PRUNE)\, etc to determine if there is any precedence when several of them are present. It seems that the rule is that whichever one is backtracked onto first does its thing\, and earlier ones are ignored. But there is one exception. Consider these two patterns when matched against "aaaaaac" (6 'a' characters plus 'c')​:

/aaaaa(*SKIP)b|a+c/ /aaaaa(*PRUNE)b|a+c/

The first matches "ac" and the second "aaaac"\, entirely as expected. However\, if the patterns are changed to

/aaaaa(*COMMIT)(*SKIP)b|a+c/ /aaaaa(*COMMIT)(*PRUNE)b|a+c/

the first still matches "ac"\, but the second now gives "no match". In all other cases I have tried\, such as putting (*PRUNE) in front of (*SKIP) or vice versa\, and including tests with (*THEN)\, the insertion of the first verb makes no difference\, which is what I would expect if the rule is "first backtracked onto is activated".

Here is some evidence of these effects​:

$ perl -e 'print (("aaaaaac" =~ /aaaaa(*SKIP)b|a+c/)? "$&\n"​:"no match\n");' ac $ perl -e 'print (("aaaaaac" =~ /aaaaa(*PRUNE)b|a+c/)? "$&\n"​:"no match\n");' aaaac $ perl -e 'print (("aaaaaac" =~ /aaaaa(*COMMIT)(*SKIP)b|a+c/)? "$&\n"​:"no match\n");' ac $ perl -e 'print (("aaaaaac" =~ /aaaaa(*COMMIT)(*PRUNE)b|a+c/)? "$&\n"​:"no match\n");' no match $

Perl Info ``` Flags: category=core severity=low Site configuration information for perl 5.16.2: Configured by nobody at Sat Feb 2 15:24:52 CET 2013. Summary of my perl5 (revision 5 version 16 subversion 2) configuration: Platform: osname=linux, osvers=3.7.4-1-arch, archname=i686-linux-thread-multi uname='linux flo32 3.7.4-1-arch #1 smp preempt mon jan 21 23:05:29 cet 2013 i686 gnulinux ' config_args='-des -Dusethreads -Duseshrplib -Doptimize=-march=i686 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -D_FORTIFY_SOURCE=2 -Dprefix=/usr -Dvendorprefix=/usr -Dprivlib=/usr/share/perl5/core_perl -Darchlib=/usr/lib/perl5/core_perl -Dsitelib=/usr/share/perl5/site_perl -Dsitearch=/usr/lib/perl5/site_perl -Dvendorlib=/usr/share/perl5/vendor_perl -Dvendorarch=/usr/lib/perl5/vendor_perl -Dscriptdir=/usr/bin/core_perl -Dsitescript=/usr/bin/site_perl -Dvendorscript=/usr/bin/vendor_perl -Dinc_version_list=none -Dman1ext=1perl -Dman3ext=3perl -Dlddlflags=-shared -Wl,-O1,--sort-common,--as-needed,-z,relro -Dldflags=-Wl,-O1,--sort-common,--as-needed,-z,relro' hint=recommended, useposix=true, d_sigaction=define useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=undef, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-march=i686 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -D_FORTIFY_SOURCE=2', cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='', gccversion='4.7.2', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags ='-Wl,-O1,--sort-common,--as-needed,-z,relro -fstack-protector -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc -lgdbm_compat perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc libc=/lib/libc-2.17.so, so=so, useshrplib=true, libperl=libperl.so gnulibc_version='2.17' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/core_perl/CORE' cccdlflags='-fPIC', lddlflags='-shared -Wl,-O1,--sort-common,--as-needed,-z,relro -L/usr/local/lib -fstack-protector' Locally applied patches: @INC for perl 5.16.2: /usr/lib/perl5/site_perl /usr/share/perl5/site_perl /usr/lib/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib/perl5/core_perl /usr/share/perl5/core_perl . Environment for perl 5.16.2: HOME=/home/ph10 LANG=en_GB LANGUAGE (unset) LC_ALL=C LC_COLLATE=C LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/ph10/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/bin/vendor_perl:/usr/bin/core_perl:. PERL_BADLANG (unset) SHELL=/bin/bash ```
p5pRT commented 11 years ago

From @ikegami

On Fri\, Mar 15\, 2013 at 7​:18 AM\, Philip Hazel \perlbug\-followup@​perl\.orgwrote​:

# New Ticket Created by Philip Hazel # Please include the string​: [perl #117179] # in the subject line of all future correspondence about this issue. # \<URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=117179 >

Cc​: nobody Subject​: Regex​: COMMIT+SKIP differs from COMMIT+PRUNE Message-Id​: \5\.16\.2\_19279\_1363344286@&#8203;quercite Reply-To​: ph10@​hermes.cam.ac.uk To​: perlbug@​perl.org From​: ph10@​hermes.cam.ac.uk

This is a bug report for perl from ph10@​hermes.cam.ac.uk\, generated with the help of perlbug 1.39 running under perl 5.16.2.

----------------------------------------------------------------- [Please describe your issue here]

I have been testing combinations of (*SKIP)\, (*PRUNE)\, etc to determine if there is any precedence when several of them are present. It seems that the rule is that whichever one is backtracked onto first does its thing\, and earlier ones are ignored. But there is one exception. Consider these two patterns when matched against "aaaaaac" (6 'a' characters plus 'c')​:

/aaaaa(*SKIP)b|a+c/ /aaaaa(*PRUNE)b|a+c/

The first matches "ac" and the second "aaaac"\, entirely as expected. However\, if the patterns are changed to

/aaaaa(*COMMIT)(*SKIP)b|a+c/ /aaaaa(*COMMIT)(*PRUNE)b|a+c/

the first still matches "ac"\, but the second now gives "no match". In all other cases I have tried\, such as putting (*PRUNE) in front of (*SKIP) or vice versa\, and including tests with (*THEN)\, the insertion of the first verb makes no difference\, which is what I would expect if the rule is "first backtracked onto is activated".

(*COMMIT)'s or (*PRUNE)'s behaviour differs from their documentation. According to (*COMMIT) 's documentation\, it should do its thing "when backtracked into"\, but that can't happens when (*PRUNE) comes right after because "no further backtracking will take place" after backtracking into (*PRUNE).

p5pRT commented 11 years ago

The RT System itself - Status changed from 'new' to 'open'