Open p5pRT opened 6 years ago
The Perl documentation says these things:
(1) When discussing subroutine calls such as (?1): "Treat the contents of a given capture buffer in the current pattern as an independent subpattern and attempt to match it at the current position in the string."
(2) When discussing (*ACCEPT): "When inside of a nested pattern\, such as recursion\, or in a subpattern dynamically generated via "(??{})"\, only the innermost pattern is ended immediately." In other words\, the effect of (*ACCEPT) is confined to the subroutine/recursion. This is indeed how it works:
$ perl -e 'if (ab =~ /(?1)b(?(DEFINE)(a(*ACCEPT)z))/) { print "yes >$&\<\n"; } else { print "no \n"; }' yes >ab\<
In this example (?1) is successful\, so it goes on to match "b"; it does not terminate the whole match after (?1) matches "a". So far\, so good.
I couldn't find any statement about what happens when (*COMMIT)\, (*PRUNE)\, (*SKIP)\, or (*THEN) are triggered inside a recursion or subroutine call and backtrack to the outer level. Experiments seem to indicate that these verbs are *not* confined to within a subroutine call:
$ perl -e 'if (ac =~ /(?1)(a(*COMMIT)b)|ac/) { print "yes >$&\<\n"; } else { print "no \n"; }' no
If (*COMMIT) had just caused (?1) to fail\, there should have been a backtrack that would enable the "ac" branch to match. It appears that (*COMMIT) has caused the entire pattern match to fail. This seems wrong to me\, and IMHO it contradicts statement (1) above. And it seems inconsistent that (*ACCEPT) is treated differently.
FYI: PCRE does restrict (*COMMIT)\, (*PRUNE)\, (*SKIP)\, and (*THEN) to act only within a subroutine call (as well as (*ACCEPT)). This was originally because subroutine calls were atomic in PCRE. From release 10.30 of PCRE\, however\, subroutine (or recursive) calls are no longer atomic\, but I kept the restriction on the backtracking verbs for backwards compatibility. The Perl documentation mentions that PCRE and Python have atomic subroutine calls; that now needs updating for PCRE.
If the current behaviour of (*COMMIT) etc. is what is intended\, it would be useful for it to be documented.
I hope that is all clear. Thanks for your attention.
Regards\, Philip Hazel
On Tue\, 24 Jul 2018 15:43:57 GMT\, ph10@hermes.cam.ac.uk wrote:
Subject: *COMMIT etc in subroutines Message-Id: \5\.26\.2\_8846\_1532445050@​quercite To: perlbug@perl.org From: ph10@cam.ac.uk Reply-To: ph10@cam.ac.uk Cc: builduser
This is a bug report for perl from ph10@cam.ac.uk\, generated with the help of perlbug 1.40 running under perl 5.26.2.
----------------------------------------------------------------- [Please describe your issue here]
The Perl documentation says these things:
(1) When discussing subroutine calls such as (?1): "Treat the contents of a given capture buffer in the current pattern as an independent subpattern and attempt to match it at the current position in the string."
(2) When discussing (*ACCEPT): "When inside of a nested pattern\, such as recursion\, or in a subpattern dynamically generated via "(??{})"\, only the innermost pattern is ended immediately." In other words\, the effect of (*ACCEPT) is confined to the subroutine/recursion. This is indeed how it works:
$ perl -e 'if (ab =~ /(?1)b(?(DEFINE)(a(*ACCEPT)z))/) { print "yes
$&\<\n"; } else { print "no \n"; }' yes >ab\<
In this example (?1) is successful\, so it goes on to match "b"; it does not terminate the whole match after (?1) matches "a". So far\, so good.
I couldn't find any statement about what happens when (*COMMIT)\, (*PRUNE)\, (*SKIP)\, or (*THEN) are triggered inside a recursion or subroutine call and backtrack to the outer level. Experiments seem to indicate that these verbs are *not* confined to within a subroutine call:
$ perl -e 'if (ac =~ /(?1)(a(*COMMIT)b)|ac/) { print "yes >$&\<\n"; } else { print "no \n"; }' no
If (*COMMIT) had just caused (?1) to fail\, there should have been a backtrack that would enable the "ac" branch to match. It appears that (*COMMIT) has caused the entire pattern match to fail. This seems wrong to me\, and IMHO it contradicts statement (1) above. And it seems inconsistent that (*ACCEPT) is treated differently.
FYI: PCRE does restrict (*COMMIT)\, (*PRUNE)\, (*SKIP)\, and (*THEN) to act only within a subroutine call (as well as (*ACCEPT)). This was originally because subroutine calls were atomic in PCRE. From release 10.30 of PCRE\, however\, subroutine (or recursive) calls are no longer atomic\, but I kept the restriction on the backtracking verbs for backwards compatibility. The Perl documentation mentions that PCRE and Python have atomic subroutine calls; that now needs updating for PCRE.
To focus in on just this documentation piece:
Could you provide a patch for the Perl documentation?
Also\, could you provide a link to the PCRE documentation and\, perhaps\, an illustration of how this works in PCRE?
If the current behaviour of (*COMMIT) etc. is what is intended\, it would be useful for it to be documented.
I hope that is all clear. Thanks for your attention.
Regards\, Philip Hazel
Thank you very much. -- James E Keenan (jkeenan@cpan.org)
The RT System itself - Status changed from 'new' to 'open'
On Mon\, 3 Sep 2018\, James E Keenan via RT wrote:
To focus in on just this documentation piece:
Could you provide a patch for the Perl documentation?
Also\, could you provide a link to the PCRE documentation and\, perhaps\, an illustration of how this works in PCRE?
I don't know what form would be best for sending patches to Perl documentation. Currently the perlre man page says this:
Note that this pattern does not behave the same way as the equivalent PCRE or
Python construct of the same form. In Perl you can backtrack into a recursed
group\, in PCRE and Python the recursed into group is treated as atomic.
It would now be more accurate to say this:
Note that this pattern does not behave the same way as the equivalent Python construct of the same form. In Perl you can backtrack into a recursed group\, but in Python the recursed group is treated as atomic. This is also true of earlier PCRE releases\, but from PCRE 10.30 onwards backtracking into recursed groups is implemented as it is for Perl.
The documentation for the current PCRE release is here:
http://www.pcre.org/current/doc/html/
I have done a bit of editing on the documentation for the next release\, to try to make it as clear as I can. This is how the relevant section now reads:
"Backtracking verbs in subroutines"
These behaviours occur whether or not the subpattern is called recursively.
(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to
succeed without any further processing. Matching then continues after the
subroutine call. Perl documents this behaviour. Perl's treatment of the other
verbs in subroutines is different in some cases.
(*FAIL) in a subpattern called as a subroutine has its normal effect: it forces an immediate backtrack.
(*COMMIT)\, (*SKIP)\, and (*PRUNE) cause the subroutine match to fail when
triggered by being backtracked to in a subpattern called as a subroutine. There
is then a backtrack at the outer level.
(*THEN)\, when triggered\, skips to the next alternative in the innermost
enclosing group within the subpattern that has alternatives (its normal
behaviour). However\, if there is no such group within the subroutine
subpattern\, the subroutine match fails and there is a backtrack at the outer
level.
I hope this helps.
Regards\, Philip
-- Philip Hazel
This is on my todo list to review. But i am very busy.
Yves On Tue\, 4 Sep 2018 at 19:56\, \ph10@​hermes\.cam\.ac\.uk wrote:
On Mon\, 3 Sep 2018\, James E Keenan via RT wrote:
To focus in on just this documentation piece:
Could you provide a patch for the Perl documentation?
Also\, could you provide a link to the PCRE documentation and\, perhaps\, an illustration of how this works in PCRE?
I don't know what form would be best for sending patches to Perl documentation. Currently the perlre man page says this:
Note that this pattern does not behave the same way as the equivalent PCRE or Python construct of the same form. In Perl you can backtrack into a recursed group\, in PCRE and Python the recursed into group is treated as atomic.
It would now be more accurate to say this:
Note that this pattern does not behave the same way as the equivalent Python construct of the same form. In Perl you can backtrack into a recursed group\, but in Python the recursed group is treated as atomic. This is also true of earlier PCRE releases\, but from PCRE 10.30 onwards backtracking into recursed groups is implemented as it is for Perl.
The documentation for the current PCRE release is here:
http://www.pcre.org/current/doc/html/
I have done a bit of editing on the documentation for the next release\, to try to make it as clear as I can. This is how the relevant section now reads:
---------------------------------------------------------------------- "Backtracking verbs in subroutines"
These behaviours occur whether or not the subpattern is called recursively.
(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to succeed without any further processing. Matching then continues after the subroutine call. Perl documents this behaviour. Perl's treatment of the other verbs in subroutines is different in some cases.
(*FAIL) in a subpattern called as a subroutine has its normal effect: it forces an immediate backtrack.
(*COMMIT)\, (*SKIP)\, and (*PRUNE) cause the subroutine match to fail when triggered by being backtracked to in a subpattern called as a subroutine. There is then a backtrack at the outer level.
(*THEN)\, when triggered\, skips to the next alternative in the innermost enclosing group within the subpattern that has alternatives (its normal behaviour). However\, if there is no such group within the subroutine subpattern\, the subroutine match fails and there is a backtrack at the outer level. ----------------------------------------------------------------------
I hope this helps.
Regards\, Philip
-- Philip Hazel
-- perl -Mre=debug -e "/just|another|perl|hacker/"
Migrated from rt.perl.org#133405 (status was 'open')
Searchable as RT133405$