Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.96k stars 555 forks source link

*COMMIT etc in subroutines #16645

Open p5pRT opened 6 years ago

p5pRT commented 6 years ago

Migrated from rt.perl.org#133405 (status was 'open')

Searchable as RT133405$

p5pRT commented 6 years ago

From ph10@hermes.cam.ac.uk

Created by ph10@cam.ac.uk

The Perl documentation says these things​:

(1) When discussing subroutine calls such as (?1)​: "Treat the contents of a given capture buffer in the current pattern as an independent subpattern and attempt to match it at the current position in the string."

(2) When discussing (*ACCEPT)​: "When inside of a nested pattern\, such as recursion\, or in a subpattern dynamically generated via "(??{})"\, only the innermost pattern is ended immediately." In other words\, the effect of (*ACCEPT) is confined to the subroutine/recursion. This is indeed how it works​:

$ perl -e 'if (ab =~ /(?1)b(?(DEFINE)(a(*ACCEPT)z))/) { print "yes >$&\<\n"; } else { print "no \n"; }' yes >ab\<

In this example (?1) is successful\, so it goes on to match "b"; it does not terminate the whole match after (?1) matches "a". So far\, so good.

I couldn't find any statement about what happens when (*COMMIT)\, (*PRUNE)\, (*SKIP)\, or (*THEN) are triggered inside a recursion or subroutine call and backtrack to the outer level. Experiments seem to indicate that these verbs are *not* confined to within a subroutine call​:

$ perl -e 'if (ac =~ /(?1)(a(*COMMIT)b)|ac/) { print "yes >$&\<\n"; } else { print "no \n"; }' no

If (*COMMIT) had just caused (?1) to fail\, there should have been a backtrack that would enable the "ac" branch to match. It appears that (*COMMIT) has caused the entire pattern match to fail. This seems wrong to me\, and IMHO it contradicts statement (1) above. And it seems inconsistent that (*ACCEPT) is treated differently.

FYI​: PCRE does restrict (*COMMIT)\, (*PRUNE)\, (*SKIP)\, and (*THEN) to act only within a subroutine call (as well as (*ACCEPT)). This was originally because subroutine calls were atomic in PCRE. From release 10.30 of PCRE\, however\, subroutine (or recursive) calls are no longer atomic\, but I kept the restriction on the backtracking verbs for backwards compatibility. The Perl documentation mentions that PCRE and Python have atomic subroutine calls; that now needs updating for PCRE.

If the current behaviour of (*COMMIT) etc. is what is intended\, it would be useful for it to be documented.

I hope that is all clear. Thanks for your attention.

Regards\, Philip Hazel

Perl Info ``` Flags: category=core severity=low Site configuration information for perl 5.26.2: Configured by builduser at Thu Jun 28 12:11:25 CEST 2018. Summary of my perl5 (revision 5 version 26 subversion 2) configuration: Platform: osname=linux osvers=4.17.3-1-arch archname=x86_64-linux-thread-multi uname='linux flo-64s 4.17.3-1-arch #1 smp preempt tue jun 26 04:42:36 utc 2018 x86_64 gnulinux ' config_args='-des -Dusethreads -Duseshrplib -Doptimize=-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -fno-plt -Dprefix=/usr -Dvendorprefix=/usr -Dprivlib=/usr/share/perl5/core_perl -Darchlib=/usr/lib/perl5/5.26/core_perl -Dsitelib=/usr/share/perl5/site_perl -Dsitearch=/usr/lib/perl5/5.26/site_perl -Dvendorlib=/usr/share/perl5/vendor_perl -Dvendorarch=/usr/lib/perl5/5.26/vendor_perl -Dscriptdir=/usr/bin/core_perl -Dsitescript=/usr/bin/site_perl -Dvendorscript=/usr/bin/vendor_perl -Dinc_version_list=none -Dman1ext=1perl -Dman3ext=3perl -Dcccdlflags='-fPIC' -Dlddlflags=-shared -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -Dldflags=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now' hint=recommended useposix=true d_sigaction=define useithreads=define usemultiplicity=define use64bitint=define use64bitall=define uselongdouble=undef usemymalloc=n default_inc_excludes_dot=define bincompat5005=undef Compiler: cc='cc' ccflags ='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64' optimize='-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -fno-plt' cppflags='-D_REENTRANT -D_GNU_SOURCE -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include' ccversion='' gccversion='8.1.1 20180531' gccosandvers='' intsize=4 longsize=8 ptrsize=8 doublesize=8 byteorder=12345678 doublekind=3 d_longlong=define longlongsize=8 d_longdbl=define longdblsize=16 longdblkind=3 ivtype='long' ivsize=8 nvtype='double' nvsize=8 Off_t='off_t' lseeksize=8 alignbytes=8 prototype=define Linker and Libraries: ld='cc' ldflags ='-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -fstack-protector-strong -L/usr/local/lib' libpth=/usr/local/lib /usr/lib/gcc/x86_64-pc-linux-gnu/8.1.1/include-fixed /usr/lib /lib/../lib /usr/lib/../lib /lib /lib64 /usr/lib64 libs=-lpthread -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat perllibs=-lpthread -ldl -lm -lcrypt -lutil -lc libc=libc-2.27.so so=so useshrplib=true libperl=libperl.so gnulibc_version='2.27' Dynamic Linking: dlsrc=dl_dlopen.xs dlext=so d_dlsymun=undef ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.26/core_perl/CORE' cccdlflags='-fPIC' lddlflags='-shared -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -L/usr/local/lib -fstack-protector-strong' @INC for perl 5.26.2: /usr/lib/perl5/5.26/site_perl /usr/share/perl5/site_perl /usr/lib/perl5/5.26/vendor_perl /usr/share/perl5/vendor_perl /usr/lib/perl5/5.26/core_perl /usr/share/perl5/core_perl Environment for perl 5.26.2: HOME=/home/ph10 LANG=en_GB.utf8 LANGUAGE=en_GB.utf8 LC_ALL=C LC_COLLATE=C LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/home/ph10/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/opt/android-sdk/platform-tools:/opt/android-sdk/tools:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/usr/sbin:.:/opt/android-sdk/platform-tools:/opt/android-sdk/tools:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl PERL_BADLANG (unset) SHELL=/bin/bash ```
p5pRT commented 6 years ago

From @jkeenan

On Tue\, 24 Jul 2018 15​:43​:57 GMT\, ph10@​hermes.cam.ac.uk wrote​:

Subject​: *COMMIT etc in subroutines Message-Id​: \5\.26\.2\_8846\_1532445050@&#8203;quercite To​: perlbug@​perl.org From​: ph10@​cam.ac.uk Reply-To​: ph10@​cam.ac.uk Cc​: builduser

This is a bug report for perl from ph10@​cam.ac.uk\, generated with the help of perlbug 1.40 running under perl 5.26.2.

----------------------------------------------------------------- [Please describe your issue here]

The Perl documentation says these things​:

(1) When discussing subroutine calls such as (?1)​: "Treat the contents of a given capture buffer in the current pattern as an independent subpattern and attempt to match it at the current position in the string."

(2) When discussing (*ACCEPT)​: "When inside of a nested pattern\, such as recursion\, or in a subpattern dynamically generated via "(??{})"\, only the innermost pattern is ended immediately." In other words\, the effect of (*ACCEPT) is confined to the subroutine/recursion. This is indeed how it works​:

$ perl -e 'if (ab =~ /(?1)b(?(DEFINE)(a(*ACCEPT)z))/) { print "yes

$&\<\n"; } else { print "no \n"; }' yes >ab\<

In this example (?1) is successful\, so it goes on to match "b"; it does not terminate the whole match after (?1) matches "a". So far\, so good.

I couldn't find any statement about what happens when (*COMMIT)\, (*PRUNE)\, (*SKIP)\, or (*THEN) are triggered inside a recursion or subroutine call and backtrack to the outer level. Experiments seem to indicate that these verbs are *not* confined to within a subroutine call​:

$ perl -e 'if (ac =~ /(?1)(a(*COMMIT)b)|ac/) { print "yes >$&\<\n"; } else { print "no \n"; }' no

If (*COMMIT) had just caused (?1) to fail\, there should have been a backtrack that would enable the "ac" branch to match. It appears that (*COMMIT) has caused the entire pattern match to fail. This seems wrong to me\, and IMHO it contradicts statement (1) above. And it seems inconsistent that (*ACCEPT) is treated differently.

FYI​: PCRE does restrict (*COMMIT)\, (*PRUNE)\, (*SKIP)\, and (*THEN) to act only within a subroutine call (as well as (*ACCEPT)). This was originally because subroutine calls were atomic in PCRE. From release 10.30 of PCRE\, however\, subroutine (or recursive) calls are no longer atomic\, but I kept the restriction on the backtracking verbs for backwards compatibility. The Perl documentation mentions that PCRE and Python have atomic subroutine calls; that now needs updating for PCRE.

To focus in on just this documentation piece​:

Could you provide a patch for the Perl documentation?

Also\, could you provide a link to the PCRE documentation and\, perhaps\, an illustration of how this works in PCRE?

If the current behaviour of (*COMMIT) etc. is what is intended\, it would be useful for it to be documented.

I hope that is all clear. Thanks for your attention.

Regards\, Philip Hazel

Thank you very much. -- James E Keenan (jkeenan@​cpan.org)

p5pRT commented 6 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 6 years ago

From ph10@hermes.cam.ac.uk

On Mon\, 3 Sep 2018\, James E Keenan via RT wrote​:

To focus in on just this documentation piece​:

Could you provide a patch for the Perl documentation?

Also\, could you provide a link to the PCRE documentation and\, perhaps\, an illustration of how this works in PCRE?

I don't know what form would be best for sending patches to Perl documentation. Currently the perlre man page says this​:

  Note that this pattern does not behave the same way as the equivalent PCRE or   Python construct of the same form. In Perl you can backtrack into a recursed   group\, in PCRE and Python the recursed into group is treated as atomic.  
It would now be more accurate to say this​:

  Note that this pattern does not behave the same way as the equivalent Python   construct of the same form. In Perl you can backtrack into a recursed group\,   but in Python the recursed group is treated as atomic. This is also true of   earlier PCRE releases\, but from PCRE 10.30 onwards backtracking into recursed   groups is implemented as it is for Perl.

The documentation for the current PCRE release is here​:

http​://www.pcre.org/current/doc/html/

I have done a bit of editing on the documentation for the next release\, to try to make it as clear as I can. This is how the relevant section now reads​:


"Backtracking verbs in subroutines"

These behaviours occur whether or not the subpattern is called recursively.

(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to succeed without any further processing. Matching then continues after the subroutine call. Perl documents this behaviour. Perl's treatment of the other
verbs in subroutines is different in some cases.

(*FAIL) in a subpattern called as a subroutine has its normal effect​: it forces an immediate backtrack.

(*COMMIT)\, (*SKIP)\, and (*PRUNE) cause the subroutine match to fail when
triggered by being backtracked to in a subpattern called as a subroutine. There is then a backtrack at the outer level.

(*THEN)\, when triggered\, skips to the next alternative in the innermost
enclosing group within the subpattern that has alternatives (its normal
behaviour). However\, if there is no such group within the subroutine
subpattern\, the subroutine match fails and there is a backtrack at the outer level.


I hope this helps.

Regards\, Philip

-- Philip Hazel

p5pRT commented 6 years ago

From @demerphq

This is on my todo list to review. But i am very busy.

Yves On Tue\, 4 Sep 2018 at 19​:56\, \ph10@&#8203;hermes\.cam\.ac\.uk wrote​:

On Mon\, 3 Sep 2018\, James E Keenan via RT wrote​:

To focus in on just this documentation piece​:

Could you provide a patch for the Perl documentation?

Also\, could you provide a link to the PCRE documentation and\, perhaps\, an illustration of how this works in PCRE?

I don't know what form would be best for sending patches to Perl documentation. Currently the perlre man page says this​:

Note that this pattern does not behave the same way as the equivalent PCRE or Python construct of the same form. In Perl you can backtrack into a recursed group\, in PCRE and Python the recursed into group is treated as atomic.

It would now be more accurate to say this​:

Note that this pattern does not behave the same way as the equivalent Python construct of the same form. In Perl you can backtrack into a recursed group\, but in Python the recursed group is treated as atomic. This is also true of earlier PCRE releases\, but from PCRE 10.30 onwards backtracking into recursed groups is implemented as it is for Perl.

The documentation for the current PCRE release is here​:

http​://www.pcre.org/current/doc/html/

I have done a bit of editing on the documentation for the next release\, to try to make it as clear as I can. This is how the relevant section now reads​:

---------------------------------------------------------------------- "Backtracking verbs in subroutines"

These behaviours occur whether or not the subpattern is called recursively.

(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to succeed without any further processing. Matching then continues after the subroutine call. Perl documents this behaviour. Perl's treatment of the other verbs in subroutines is different in some cases.

(*FAIL) in a subpattern called as a subroutine has its normal effect​: it forces an immediate backtrack.

(*COMMIT)\, (*SKIP)\, and (*PRUNE) cause the subroutine match to fail when triggered by being backtracked to in a subpattern called as a subroutine. There is then a backtrack at the outer level.

(*THEN)\, when triggered\, skips to the next alternative in the innermost enclosing group within the subpattern that has alternatives (its normal behaviour). However\, if there is no such group within the subroutine subpattern\, the subroutine match fails and there is a backtrack at the outer level. ----------------------------------------------------------------------

I hope this helps.

Regards\, Philip

-- Philip Hazel

-- perl -Mre=debug -e "/just|another|perl|hacker/"