Open p5pRT opened 18 years ago
edi@bird:\~ > perl -e 'use Data::Dumper; "a" =~ /((a)*)*/; print Dumper $1\, $2' $VAR1 = ''; $VAR2 = undef; edi@bird:\~ > perl -e 'use Data::Dumper; "a" =~ /(((a))*)*/; print Dumper $1\, $2' $VAR1 = ''; $VAR2 = 'a';
Obviously\, $2 should either be undef or 'a' in _both_ cases. I think we see this due to wrong optimizations and have posted a more detailed analysis to comp.lang.perl.misc:
\<http://groups.google.com/groups?selm=87zns15gal.fsf%40bird.agharta.de&rnum=7>
On 27 Nov 2002 09:18:54 -0000\, "edi@agharta.de (via RT)" \perlbug@​perl\.org said:
> # New Ticket Created by edi@agharta.de > # Please include the string: [perl #18708] > # in the subject line of all future correspondence about this issue. > # \<URL: http://rt.perl.org/rt2/Ticket/Display.html?id=18708 >
> This is a bug report for perl from edi@agharta.de\, > generated with the help of perlbug 1.34 running under perl v5.8.0.
> ----------------------------------------------------------------- > [Please enter your report here]
> edi@bird:\~ > perl -e 'use Data::Dumper; "a" =~ /((a)*)*/; print Dumper $1\, $2' > $VAR1 = ''; > $VAR2 = undef;
Archaeological findings about this bug...
It was introduced to the trunk with patch 6373.
The bug was also integrated into 5.6.1 with patch 7772. (Note: 7772 only compiles if 7799 is also integrated.)
Simply undoing the regexec.c part of that patch fixes the bug but also breaks test 860 in the test suite:
not ok 860 () ^(a(b)?)+$:aba:y:-$1-$2-:-a-- => `-a-b-'\, match=1
The patch I tried was:
#### DO NOT APPLY ####
Hope that helps somebody else to find a solution, -- andreas
andreas.koenig@anima.de (Andreas J. Koenig) wrote: :>>>>> On 27 Nov 2002 09:18:54 -0000\, "edi@agharta.de (via RT)" \perlbug@​perl\.org said: : > edi@bird:\~ > perl -e 'use Data::Dumper; "a" =~ /((a)*)*/; print Dumper $1\, $2' : > $VAR1 = ''; : > $VAR2 = undef; : :Archaeological findings about this bug... : :It was introduced to the trunk with patch 6373. : :The bug was also integrated into 5.6.1 with patch 7772. (Note: 7772 :only compiles if 7799 is also integrated.)
Digging a bit further\, the actual patch was submitted (by me) in the discussion on bug #20000701.002: http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2000-07/msg00514.html
The difference between the two test cases is that for /((a)*)*/\, the inner paren gets optimised to CURLYN; for /(((a))*)*/ it stays as CURLYX. I suspect that there is something lacking from that patch for the CURLYN branch\, but I haven't yet got a fix.
Hugo
This is a bug report for perl from eric.niebler@gmail.com\, generated with the help of perlbug 1.35 running under perl v5.8.7.
----------------------------------------------------------------- Consider the following program:
$str = 'aaA'; $str =~ /(((?:a))?)+/i; if(defined($2)) { print "$2"; } else { print "not defined"; }
This prints "not defined\," and I think that's right. But if I change the regex to /(((a))?)+/i (that is\, if I change the third group from non-capturing to capturing)\, the program prints "A".
I can't think of a reason why changing group 3 from non-capturing to capturing should have any effect on whether group 2 captures anything. Seems like a regex bug to me.
Eric Niebler (via RT) \perlbug\-followup@​perl\.org wrote: :Consider the following program: : : $str = 'aaA'; : $str =~ /(((?:a))?)+/i; : if(defined($2)) { print "$2"; } : else { print "not defined"; } : :This prints "not defined\," and I think that's right. :But if I change the regex to /(((a))?)+/i (that is\, if :I change the third group from non-capturing to capturing)\, :the program prints "A". : :I can't think of a reason why changing group 3 from :non-capturing to capturing should have any effect on :whether group 2 captures anything. Seems like a regex :bug to me.
I agree the inconsistency smells like a bug\, though it isn't clear to me which variant exhibits it - both results seem reasonable in the absence of the other.
-Dr output shows that the two regexps are optimised differently: with /(((?:a))?)+/\, the $2 loop is optimised to CURLYN\, but with /(((a))?)+/ the interior is too complex for the optimisation to occur (which may itself be\, if not a bug\, an optimisation wart) so it remains as CURLYX. Presumably it is in the differing implementation of CURLYN and CURLYX that the difference arises\, but this isn't something I have time to look into right now.
The results may be reasonable nonetheless - we could in principle
stick with "the $\
Hugo
The RT System itself - Status changed from 'new' to 'open'
On Tue\, Jan 03\, 2006 at 04:27:40AM +0000\, hv@crypt.org wrote:
Eric Niebler (via RT) \perlbug\-followup@​perl\.org wrote: :Consider the following program: : : $str = 'aaA'; : $str =~ /(((?:a))?)+/i; : if(defined($2)) { print "$2"; } : else { print "not defined"; } : :This prints "not defined\," and I think that's right. :But if I change the regex to /(((a))?)+/i (that is\, if :I change the third group from non-capturing to capturing)\, :the program prints "A". : :I can't think of a reason why changing group 3 from :non-capturing to capturing should have any effect on :whether group 2 captures anything. Seems like a regex :bug to me.
I agree the inconsistency smells like a bug\, though it isn't clear to me which variant exhibits it - both results seem reasonable in the absence of the other.
-Dr output shows that the two regexps are optimised differently: with /(((?:a))?)+/\, the $2 loop is optimised to CURLYN\, but with /(((a))?)+/ the interior is too complex for the optimisation to occur (which may itself be\, if not a bug\, an optimisation wart) so it remains as CURLYX. Presumably it is in the differing implementation of CURLYN and CURLYX that the difference arises\, but this isn't something I have time to look into right now.
The results may be reasonable nonetheless - we could in principle stick with "the $\
variables will contain the last thing successfully matched"\, while adding that "optional zero-length submatches (that don't affect success or failure of the match as a whole) may be elided by the optimiser". Which I suspect is what we're getting\, even though the evidence is that the less optimised variant is the one doing the eliding.
The following program suggests that in both regexes\, the outer set of parenthesis are matched four times:
#!/usr/bin/perl
use strict; use warnings; no warnings 'syntax';
$_ = 'aaA';
my ($i\, $j);
/(((?:a))?(?{ $i ++; print "$i: $2\n" }))+/i;
/(((a))?(?{ $j ++; print "$j: $2\n" }))+/i;
__END__
1: a 2: a 3: A Use of uninitialized value in concatenation (.) or string at (re_eval 1) line 1. 4: 1: a 2: a 3: A 4: A
Abigail
On Tue\, Jan 03\, 2006 at 04:27:40AM +0000\, hv@crypt.org wrote:
Eric Niebler (via RT) \perlbug\-followup@​perl\.org wrote: :Consider the following program: : : $str = 'aaA'; : $str =~ /(((?:a))?)+/i; : if(defined($2)) { print "$2"; } : else { print "not defined"; } : :This prints "not defined\," and I think that's right. :But if I change the regex to /(((a))?)+/i (that is\, if :I change the third group from non-capturing to capturing)\, :the program prints "A". : :I can't think of a reason why changing group 3 from :non-capturing to capturing should have any effect on :whether group 2 captures anything. Seems like a regex :bug to me.
I agree the inconsistency smells like a bug\, though it isn't clear to me which variant exhibits it - both results seem reasonable in the absence of the other.
To me\, it seems clear that on the last iteration of the +\, the ? should match zero times\, so $2 would be "" with the ?: and undefined without the ?:.
On Tue\, Jan 03\, 2006 at 03:59:19PM -0800\, Yitzchak Scott-Thoennes wrote:
On Tue\, Jan 03\, 2006 at 04:27:40AM +0000\, hv@crypt.org wrote:
Eric Niebler (via RT) \perlbug\-followup@​perl\.org wrote: :Consider the following program: : : $str = 'aaA'; : $str =~ /(((?:a))?)+/i; : if(defined($2)) { print "$2"; } : else { print "not defined"; } : :This prints "not defined\," and I think that's right. :But if I change the regex to /(((a))?)+/i (that is\, if :I change the third group from non-capturing to capturing)\, :the program prints "A". : :I can't think of a reason why changing group 3 from :non-capturing to capturing should have any effect on :whether group 2 captures anything. Seems like a regex :bug to me.
I agree the inconsistency smells like a bug\, though it isn't clear to me which variant exhibits it - both results seem reasonable in the absence of the other.
To me\, it seems clear that on the last iteration of the +\, the ? should match zero times\, so $2 would be "" with the ?: and undefined without the ?:.
That I don't understand. Since the ?: controls whether or not there's a $3\, why should the value of $2 be different?
Abigail
On Wed\, Jan 04\, 2006 at 09:48:14AM +0100\, Abigail wrote:
On Tue\, Jan 03\, 2006 at 03:59:19PM -0800\, Yitzchak Scott-Thoennes wrote:
On Tue\, Jan 03\, 2006 at 04:27:40AM +0000\, hv@crypt.org wrote:
Eric Niebler (via RT) \perlbug\-followup@​perl\.org wrote: :Consider the following program: : : $str = 'aaA'; : $str =~ /(((?:a))?)+/i; : if(defined($2)) { print "$2"; } : else { print "not defined"; } : :This prints "not defined\," and I think that's right. :But if I change the regex to /(((a))?)+/i (that is\, if :I change the third group from non-capturing to capturing)\, :the program prints "A". : :I can't think of a reason why changing group 3 from :non-capturing to capturing should have any effect on :whether group 2 captures anything. Seems like a regex :bug to me.
I agree the inconsistency smells like a bug\, though it isn't clear to me which variant exhibits it - both results seem reasonable in the absence of the other.
To me\, it seems clear that on the last iteration of the +\, the ? should match zero times\, so $2 would be "" with the ?: and undefined without the ?:.
That I don't understand. Since the ?: controls whether or not there's a $3\, why should the value of $2 be different?
Sorry\, I was somehow assigning numbers from the inside out instead of left to right. It should be undef in either case.
Yitzchak Scott-Thoennes wrote:
On Wed\, Jan 04\, 2006 at 09:48:14AM +0100\, Abigail wrote:
On Tue\, Jan 03\, 2006 at 03:59:19PM -0800\, Yitzchak Scott-Thoennes wrote:
On Tue\, Jan 03\, 2006 at 04:27:40AM +0000\, hv@crypt.org wrote:
Eric Niebler (via RT) \perlbug\-followup@​perl\.org wrote: :Consider the following program: : : $str = 'aaA'; : $str =~ /(((?:a))?)+/i; : if(defined($2)) { print "$2"; } : else { print "not defined"; } : :This prints "not defined\," and I think that's right. :But if I change the regex to /(((a))?)+/i (that is\, if :I change the third group from non-capturing to capturing)\, :the program prints "A". : :I can't think of a reason why changing group 3 from :non-capturing to capturing should have any effect on :whether group 2 captures anything. Seems like a regex :bug to me.
I agree the inconsistency smells like a bug\, though it isn't clear to me which variant exhibits it - both results seem reasonable in the absence of the other.
To me\, it seems clear that on the last iteration of the +\, the ? should match zero times\, so $2 would be "" with the ?: and undefined without the ?:.
That I don't understand. Since the ?: controls whether or not there's a $3\, why should the value of $2 be different?
Sorry\, I was somehow assigning numbers from the inside out instead of left to right. It should be undef in either case.
There appears to be general agreement that this is a bug. But will it get fixed? What happens next? (Sorry\, I'm not familiar with this process.)
Eric
Eric Niebler \eric\.niebler@​gmail\.com wrote: [...] :>>>>Eric Niebler (via RT) \perlbug\-followup@​perl\.org wrote: :>>>>:Consider the following program: :>>>>: :>>>>: $str = 'aaA'; :>>>>: $str =~ /(((?:a))?)+/i; :>>>>: if(defined($2)) { print "$2"; } :>>>>: else { print "not defined"; } :>>>>: :>>>>:This prints "not defined\," and I think that's right. :>>>>:But if I change the regex to /(((a))?)+/i (that is\, if :>>>>:I change the third group from non-capturing to capturing)\, :>>>>:the program prints "A". [...] :There appears to be general agreement that this is a bug. But will it :get fixed? What happens next? (Sorry\, I'm not familiar with this process.)
Now it waits until someone simultaneously acquires the time\, ability and desire to locate the bug; once located\, it may be found to be anything from easy to impossible to develop a fix that doesn't break anything else.
If a fix is developed it will go into the "bleeding edge" codebase first\, which is the one working towards v5.10 of perl; if it is stable there and does not appear to have a wider impact it will likely also be incorporated into the maintenance track used to deliver v5.8.x releases.
But there are few people with the knowledge to debug problems in the regexp engine\, and they tend to have limited time available\, so the first step may take a while.
Hugo
perl -MData::Dumper -le '"aba" =~ /^(a(b)?)+$/; print Dumper $1\, $2;' $VAR1 = 'a'; $VAR2 = undef;
This is the case because the outer + makes the subexpression
containing the second pair of capturing parentheses match twice. The
second time through\, (b) does not participate in the match\, so $2 is
undef (this coincides with ECMAScript's behaviour).
But if I change (b) to (b+) or ((b))\, the behaviour changes:
perl -MData::Dumper -le '"aba" =~ /^(a(b+)?)+$/; print Dumper $1\, $2;' $VAR1 = 'a'; $VAR2 = 'b';
perl -MData::Dumper -le '"aba" =~ /^(a((b))?)+$/; print Dumper $1\, $2;' $VAR1 = 'a'; $VAR2 = 'b';
(Though this probably makes no difference\, if this is to be made
consistent\, I think I prefer the former behaviour [!defined $2]).
This is the case both with 5.8.8 and 5.9.5 #31441.
$s = "Juusstt aannootthheerr Peerrll hhaacckkeerr\,\n"; $s =~ s/(?:((?\<!$_)$_)?){2}(?:((?\<!$_$_)$_+)?){2}/$1$2/g for 'a' .. 'z'; print $s;
Flags: category= severity=
Site configuration information for perl v5.8.8:
Configured by neo at Tue Jan 9 16:06:53 PST 2007.
Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
Platform:
osname=darwin\, osvers=8.8.0\, archname=darwin-thread-multi-2level
uname='darwin treebeard.local 8.8.0 darwin kernel version 8.8.0:
fri sep 8 17:18:57 pdt 2006; root:xnu-792.12.6.obj~1release_ppc power
macintosh powerpc '
config_args=''
hint=recommended\, useposix=true\, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define
usemultiplicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n\, bincompat5005=undef
Compiler:
cc='cc'\, ccflags ='-g -pipe -fno-common -DPERL_DARWIN -no-cpp-
precomp -fno-strict-aliasing -I/usr/local/include'\,
optimize='-O3'\,
cppflags='-no-cpp-precomp -g -pipe -fno-common -DPERL_DARWIN -no-
cpp-precomp -fno-strict-aliasing -I/usr/local/include'
ccversion=''\, gccversion='4.0.0 20041026 (Apple Computer\, Inc.
build 4061)'\, gccosandvers='darwin8'
intsize=4\, longsize=4\, ptrsize=4\, doublesize=8\, byteorder=4321
d_longlong=define\, longlongsize=8\, d_longdbl=define\, longdblsize=16
ivtype='long'\, ivsize=4\, nvtype='double'\, nvsize=8\,
Off_t='off_t'\, lseeksize=8
alignbytes=8\, prototype=define
Linker and Libraries:
ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc'\, ldflags =' -L/usr/
local/lib'
libpth=/usr/local/lib /usr/lib
libs=-ldbm -ldl -lm -lc
perllibs=-ldl -lm -lc
libc=\, so=dylib\, useshrplib=false\, libperl=libperl.a
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_dlopen.xs\, dlext=bundle\, d_dlsymun=undef\, ccdlflags=' '
cccdlflags=' '\, lddlflags=' -bundle -undefined dynamic_lookup -L/
usr/local/lib'
Locally applied patches:
@INC for perl v5.8.8: /usr/local/lib/perl5/5.8.8/darwin-thread-multi-2level /usr/local/lib/perl5/5.8.8 /usr/local/lib/perl5/site_perl/5.8.8/darwin-thread-multi-2level /usr/local/lib/perl5/site_perl/5.8.8 /usr/local/lib/perl5/site_perl /System/Library/Perl/5.8.6/darwin-thread-multi-2level /System/Library/Perl/5.8.6/darwin-thread-multi-2level /System/Library/Perl/5.8.6 /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level /Network/Library/Perl/5.8.6 /Network/Library/Perl /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1 .
Environment for perl v5.8.8: DYLD_LIBRARY_PATH (unset) HOME=/Users/neo LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/TeX/bin/powerpc- darwin6.8:/usr/local/bin PERL_BADLANG (unset) SHELL=/bin/bash
Attached is a patch with a todo test for this bug report.
Summary of the report:
#!/usr/bin/perl -l
if ("A" =~ /(((?:A))?)+/) { print "\$1 = $1\, \$2 = $2\, \$3 = $3" }
if ("A" =~ /(((A))?)+/) { print "\$1 = $1\, \$2 = $2\, \$3 = $3"; } __END__ Output:
$1 = \, $2 = \, $3 = $1 = \, $2 = A\, $3 = A
The value of the second capture group depends on wheter or not there is a third capturing group.
The value should be the same in both cases.
(For more info look at RT)
Attached is a patch with a todo test for this bug report.
Summary of the report:
#!/usr/bin/perl -l
if ("A" =~ /(((?:A))?)+/) { print "\$1 = $1\, \$2 = $2\, \$3 = $3" }
if ("A" =~ /(((A))?)+/) { print "\$1 = $1\, \$2 = $2\, \$3 = $3"; } __END__ Output:
$1 = \, $2 = \, $3 = $1 = \, $2 = A\, $3 = A
The value of the second capture group depends on wheter or not there is a third capturing group.
The value should be the same in both cases.
(For more info look at RT)
Commit 72aa120d9a32a14196c9e39aa26993909423f096 adds the attached todo .t patch to re/pat.t --Karl Williamson
This is fixed in #20677
Migrated from rt.perl.org#38133 (status was 'open')
Searchable as RT38133$