Closed p5pRT closed 20 years ago
crypt% perl -wle '/^((.)?a\2)+$/ and print "$_: -$1-$2-" for qw/ babadad babadab bababdab bababdad /' bababdad: -dad-d- crypt%
That's with 5.6.0; with bleadperl\, none of the strings match. I believe it should match 'babadad'.
More: crypt% ./perl -Dr -wle '/^((.)?a\2)+$/ and print "$_: -$1-$2-" for qw/ babadad /' 2>&1 | grep restoring restoring \1 to 0(0)..3 restoring \2 to 0(1882390424)..1 crypt%
That odd number turns out to occur because PL_reg_start_tmp[paren] is \x78455220 = " REx".
Hugo
crypt% perl -wle '/^((.)?a\2)+$/ and print "$_: -$1-$2-" for qw/ babadad babadab bababdab bababdad /' bababdad: -dad-d- crypt%
That's with 5.6.0; with bleadperl\, none of the strings match. I believe it should match 'babadad'.
It appears that there is a problem with your regexp.
/^((.)?a\2)+$/
The "^" is saying start at the beginning of the string\, and none of the strings start with "dad". Also\, the enclosed match must be three characters long. Finally\, the "+$" means can appear one or more times before the end. So\, this regexp is looking for a three character string that is of the form XaX (X is any character) that appears one or more times to the complete end of the string. That said\, the following one-liner demonstrates your regexp...
steve@kirk sandbox $ perl -wle '/^((.)?a\2)+$/ and print "$_: -$1-$2-" for qw/ dad dadad daddad babadad babadab bababdab bababdad /' dad: -dad-d- daddad: -dad-d-
"dad" matches exactly as does "daddad". "dadad" does not match since the "dad" is not repeated completely throughout the string from beginning to end.
This problem does not appear to be a bug with Perl\, but with the regular expression.
@smpeters - Status changed from 'open' to 'rejected'
"Steve Peters via RT" \perlbug\-followup@​perl\.org wrote: :> crypt% perl -wle '/^((.)?a\2)+$/ and print "$_: -$1-$2-" for qw/ :> babadad babadab bababdab bababdad /' :> bababdad: -dad-d- :> crypt% :> :> That's with 5.6.0; with bleadperl\, none of the strings match. :> I believe it should match 'babadad'. :> : :It appears that there is a problem with your regexp. : :/^((.)?a\2)+$/ : :The "^" is saying start at the beginning of the string\, and none of the :strings start with "dad". Also\, the enclosed match must be three :characters long.
That depends on your definitions\, and I think the problem here is that the regexp explores behaviour that has never been explicitly defined.
If the '?' is moved inside the parens: /^((.?)a\2)+$/ then expected behaviour is clear - match any string comprised entirely of one or more consecutive occurrences of 'a' and/or 'XaX'.
As written though\, it isn't clear what the following '\2' should match when the (.)? does not capture - it could match an empty string (as it does when the '?' is inside)\, or it could match the value it captured the last time round (which is what it did in 5.6.0\, hence the match on "bababdad")\, or it could refuse to match anything at all.
Behaviour from 5.6.1 onwards appears to take the last option: when trying a path in which the (.)? does not match\, a subsequent attempt to match \2 fails. I'm not sure whether this is explicitly documented anywhere\, nor whether it is tested. It feels like reasonable behaviour though\, since the other sensible behaviour is catered for by (.?)\, and I'd class the original behaviour of 5.6.0 (retain the previous value of $2) as a clear bug.
It is however slightly unfortunate\, in that the alternative definition makes it possible to match against captures that would only be defined later: perl -wle 'print "triangular" if ("x" x shift(@ARGV)) =~ /^(\1x)*$/' 10
Such forms can always be modified to give the desire behaviour though\, and usually with only a small change; in the above case\, for example\, you can use /^((^|\1)x)*$/.
Hugo
@smpeters - Status changed from 'rejected' to 'open'
On Sat\, Sep 18\, 2004 at 12:16:05AM +0100\, hv@crypt.org wrote:
As written though\, it isn't clear what the following '\2' should match when the (.)? does not capture - it could match an empty string (as it does when the '?' is inside)\, or it could match the value it captured the last time round (which is what it did in 5.6.0\, hence the match on "bababdad")\, or it could refuse to match anything at all.
Behaviour from 5.6.1 onwards appears to take the last option: when trying a path in which the (.)? does not match\, a subsequent attempt to match \2 fails. I'm not sure whether this is explicitly documented anywhere\, nor whether it is tested. It feels like reasonable behaviour though\, since the other sensible behaviour is catered for by (.?)\, and I'd class the original behaviour of 5.6.0 (retain the previous value of $2) as a clear bug.
I second the reasonableness of this behaviour.
On Sunday 19 September 2004 02:18 pm\, Yitzchak Scott-Thoennes wrote:
On Sat\, Sep 18\, 2004 at 12:16:05AM +0100\, hv@crypt.org wrote:
As written though\, it isn't clear what the following '\2' should match when the (.)? does not capture - it could match an empty string (as it does when the '?' is inside)\, or it could match the value it captured the last time round (which is what it did in 5.6.0\, hence the match on "bababdad")\, or it could refuse to match anything at all.
Behaviour from 5.6.1 onwards appears to take the last option: when trying a path in which the (.)? does not match\, a subsequent attempt to match \2 fails. I'm not sure whether this is explicitly documented anywhere\, nor whether it is tested. It feels like reasonable behaviour though\, since the other sensible behaviour is catered for by (.?)\, and I'd class the original behaviour of 5.6.0 (retain the previous value of $2) as a clear bug.
I second the reasonableness of this behaviour.
With all this being said\, is it agreed that the current behavior is correct\, but that we need test conditions and documentation to insure the continuation of this behavior going forward?
Steve Peters steve@fisharerojo.org
Steve Peters \steve@​fisharerojo\.org wrote: :With all this being said\, is it agreed that the current behavior is correct\, :but that we need test conditions and documentation to insure the continuation :of this behavior going forward?
I think so\, but I haven't actually checked the documentation and existing tests to verify that there is currently a gap.
(I've also only just noticed that the original bug report came from me\, but luckily my previous response is no different than it would have been with that knowledge. :)
I note that my subsequent comment is still true with latest bleadperl:
zen% ./perl -Ilib -Dr -wle '/^((.)?a\2)+$/ and print "$_: -$1-$2-" for qw/ babadad /' 2>&1 | grep restoring restoring \1 to 0(0)..3 restoring \2 to 0(1882132400)..1 zen%
.. and I assume my analysis from then is also still true:
:That odd number turns out to occur because PL_reg_start_tmp[paren] :is \x78455220 = " REx".
Hugo
Trying to close old RT tickets...
Attached patch adds the missing test for this new behaviour.
t/op/re_tests: test new behaviour for bug #3589 to close it. Todo: documentation. See http://rt.perl.org/rt3/Public/Bug/Display.html?id=3589 -- Reini Urban
On 02/07/07\, Reini Urban via RT \perlbug\-followup@​perl\.org wrote:
Trying to close old RT tickets...
Attached patch adds the missing test for this new behaviour.
Thanks\, applied as #31530.
On Mi. 04. Jul. 2007\, 06:39:37\, rafael wrote:
On 02/07/07\, Reini Urban via RT \perlbug\-followup@​perl\.org wrote:
Trying to close old RT tickets...
Attached patch adds the missing test for this new behaviour.
Thanks\, applied as #31530.
Attached is a patch for perlre.pod. If this patch is ok\, this ticket can be closed...
Quoting reneeb via RT \perlbug\-followup@​perl\.org:
On Mi. 04. Jul. 2007\, 06:39:37\, rafael wrote:
On 02/07/07\, Reini Urban via RT \perlbug\-followup@​perl\.org wrote:
Trying to close old RT tickets...
Attached patch adds the missing test for this new behaviour.
Thanks\, applied as #31530.
Attached is a patch for perlre.pod. If this patch is ok\, this ticket can be closed...
I feel that the example should be clearer or it needs more explenation...
From the example I can't immediatly tell what is happening and/or
what it special about it that it deserves a \
reading the bug report will clarify this but that shouldn't be a
requirement)
Kind regards\,
Bram
The remaining part of this ticket is clarifying in the documentation the behavior of back-references to optional captures. For example:
./perl -Ilib -wle '/^((.)?a\2)+$/ and print "$_: -$1-$2-" for qw/ babadad /'
does not match\, because the "\2" always fails (rather than allowing an empty match) if the "." didn't match. However:
./perl -Ilib -wle '/^((.?)a\2)+$/ and print "$_: -$1-$2-" for qw/ babadad /'
does match\, because an empty result is valid in the capture.
-- George Greer
On Sun\, 25 Jul 2010 19:21:33 GMT\, greerga wrote:
The remaining part of this ticket is clarifying in the documentation the behavior of back-references to optional captures. For example:
./perl -Ilib -wle '/^((.)?a\2)+$/ and print "$_: -$1-$2-" for qw/ babadad /'
does not match\, because the "\2" always fails (rather than allowing an empty match) if the "." didn't match. However:
./perl -Ilib -wle '/^((.?)a\2)+$/ and print "$_: -$1-$2-" for qw/ babadad /'
does match\, because an empty result is valid in the capture.
I believe this ticket is closable because the additional documentation needed was\, in my reading\, added by Karl Williamson in commit d8b950dc in June 2010. 'pod/perlre.pod' has a section on Capture Groups which contains this:
##### The bracketing construct "( ... )" creates capture groups (also referred to as capture buffers). To refer to the current contents of a group later on\, within the same pattern\, use "\g1" (or "\g{1}") for the first\, "\g2" (or "\g{2}") for the second\, and so on. This is called a *backreference*. There is no limit to the number of captured substrings that you may use. Groups are numbered with the leftmost open parenthesis being number 1\, etc. If a group did not match\, the associated backreference won't match either. (This can happen if the group is optional\, or in a different branch of an alternation.) #####
I'll put this ticket out of its misery within 7 days unless someone wants to carry on the discussion.
Thank you very much.
-- James E Keenan (jkeenan@cpan.org)
On 28 December 2016 at 03:15\, James E Keenan via RT \perlbug\-followup@​perl\.org wrote:
On Sun\, 25 Jul 2010 19:21:33 GMT\, greerga wrote:
The remaining part of this ticket is clarifying in the documentation the behavior of back-references to optional captures. For example:
./perl -Ilib -wle '/^((.)?a\2)+$/ and print "$_: -$1-$2-" for qw/ babadad /'
does not match\, because the "\2" always fails (rather than allowing an empty match) if the "." didn't match. However:
./perl -Ilib -wle '/^((.?)a\2)+$/ and print "$_: -$1-$2-" for qw/ babadad /'
does match\, because an empty result is valid in the capture.
I believe this ticket is closable because the additional documentation needed was\, in my reading\, added by Karl Williamson in commit d8b950dc in June 2010. 'pod/perlre.pod' has a section on Capture Groups which contains this:
##### The bracketing construct "( ... )" creates capture groups (also referred to as capture buffers). To refer to the current contents of a group later on\, within the same pattern\, use "\g1" (or "\g{1}") for the first\, "\g2" (or "\g{2}") for the second\, and so on. This is called a *backreference*. There is no limit to the number of captured substrings that you may use. Groups are numbered with the leftmost open parenthesis being number 1\, etc. If a group did not match\, the associated backreference won't match either. (This can happen if the group is optional\, or in a different branch of an alternation.) #####
I think that should be beefed up a bit to say something like
Note that a quantified group which matches zero things counts as a "non match"\, so none of the following will match:
"aba"=~/a(x)*b\1a/ "aba"=~/a(x)?b\1a/
However when the quantifier is on the inside of the group it will:
"aba"=~/a(x*)b\1a/ "aba"=~/a(x?)b\1a/
Put another way\, perl's regex engine distinguishes between "captured the empty string" and "an optional capture did not match".
You can see this in the following code:
$ perl -le'if ("aba"=~/a(x*)b/) { print "matched: "\, defined($1) ? "\<$1>" : "undef" } else { print "rejected" }' matched: \<>
$ perl -le'if ("aba"=~/a(x)*b/) { print "matched: "\, defined($1) ? "\<$1>" : "undef" } else { print "rejected" }' matched: undef
Both patterns match\, but leave the contents of the $1 capture group in a different state. Back references only match when the capture group they reference contained a defined value.
Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
Migrated from rt.perl.org#3589 (status was 'open')
Searchable as RT3589$