Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.92k stars 549 forks source link

@- array is incorrect with non matching grouping #9252

Closed p5pRT closed 16 years ago

p5pRT commented 16 years ago

Migrated from rt.perl.org#51688 (status was 'resolved')

Searchable as RT51688$

p5pRT commented 16 years ago

From @abarisani

Created by @abarisani

When using non matching (?​: ) regular expression operator with '?' or '*' conditionals the builtin @​- array and $1\, $2 matches are incorrectly set\, here's a test case​:

#!/usr/bin/perl

$string = 'BDC1B1406D​: to=\foo@​bar\, orig_to=\bar@​foo\, relay=127.0.0.1[127.0.0.1]​:1002';

$string =~ /.*to=\<(.+)>\,(?​: orig_to=\<(.+)>\,) relay=(.+)/;

print $1 . "\n"; print $2 . "\n"; print $3 . "\n";

# $1 = foo@​bar\, $2 = bar@​foo\, $3 = 127.0.0.1[127.0.0.1]​:1002

$string =~ /.*to=\<(.+)>\,(?​: orig_to=\<(.+)>\,)? relay=(.+)/;

print $1 . "\n"; print $2 . "\n"; print $3 . "\n";

# $1 = bar@​foo\, $2 = undef\, $3 = 127.0.0.1[127.0.0.1]​:1002

Clearly in this last case $1 and $2 are wrong\, using (?​: )* does the same.

This doesn't look like a consistent or expected behaviour to me.

Cheers

Perl Info ``` Flags: category=core severity=medium Site configuration information for perl v5.8.8: Configured by Gentoo at Tue Nov 20 08:36:05 UTC 2007. Summary of my perl5 (revision 5 version 8 subversion 8) configuration: Platform: osname=linux, osvers=2.6.20-hardened-r10, archname=i686-linux uname='linux fuse 2.6.20-hardened-r10 #1 fri oct 5 15:20:53 utc 2007 i686 amd sempron(tm) 2400+ authenticamd gnulinux ' config_args='-des -Darchname=i686-linux -Dcccdlflags=-fPIC -Dccdlflags=-rdynamic -Dcc=i686-pc-linux-gnu-gcc -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr -Dlocincpth= -Doptimize=-O2 -march=athlon-xp -fforce-addr -pipe -Duselargefiles -Dd_semctl_semun -Dscriptdir=/usr/bin -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dinstallman1dir=/usr/share/man/man1 -Dinstallman3dir=/usr/share/man/man3 -Dman1ext=1 -Dman3ext=3pm -Dinc_version_list=5.8.0 5.8.0/i686-linux 5.8.2 5.8.2/i686-linux 5.8.4 5.8.4/i686-linux 5.8.5 5.8.5/i686-linux 5.8.6 5.8.6/i686-linux 5.8.7 5.8.7/i686-linux -Dcf_by=Gentoo -Ud_csh -Dusenm -Ui_ndbm -Ui_gdbm -Ui_db' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='i686-pc-linux-gnu-gcc', ccflags ='-fno-strict-aliasing -pipe -Wdeclaration-after-statement -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2 -march=athlon-xp -fforce-addr -pipe', cppflags='-fno-strict-aliasing -pipe -Wdeclaration-after-statement' ccversion='', gccversion='3.4.6 (Gentoo Hardened 3.4.6-r2, ssp-3.4.6-1.0, pie-8.7.10)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='i686-pc-linux-gnu-gcc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lpthread -lnsl -lndbm -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc libc=/lib/libc-2.6.1.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.6.1' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic' cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib' Locally applied patches: @INC for perl v5.8.8: /etc/perl /usr/lib/perl5/vendor_perl/5.8.8/i686-linux /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl/5.8.7 /usr/lib/perl5/vendor_perl/5.8.7/i686-linux /usr/lib/perl5/vendor_perl /usr/lib/perl5/site_perl/5.8.8/i686-linux /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl/5.8.7 /usr/lib/perl5/site_perl/5.8.7/i686-linux /usr/lib/perl5/site_perl /usr/lib/perl5/5.8.8/i686-linux /usr/lib/perl5/5.8.8 /usr/local/lib/site_perl . Environment for perl v5.8.8: HOME=/home/lcars LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/i686-pc-linux-gnu/gcc-bin/3.4.6 PERL_BADLANG (unset) SHELL=/bin/bash ```
p5pRT commented 16 years ago

From @abarisani

This happens also with matching operators like ( ) and not only (?​: ) as I incorrectly reported.

p5pRT commented 16 years ago

@pjf - Status changed from 'new' to 'open'

p5pRT commented 16 years ago

From @pjf

G'day Andrea\,

Thanks for the bug report! However I think you may find that it's the greediness of Perl's quantifiers that are causing your problems. In the example​:

  $string =~ /.*to=\<(.+)>\,(?​: orig_to=\<(.+)>\,)? relay=(.+)/;

The .* at the start is greedy. It effectively runs all the way to the end of the string\, and then backtracks until it there's enough characters left for a match to succeed. In practice\, this usually means it will consume the longest left-most string that it can. Against the data​:

  BDC1B1406D​: to=\foo@&#8203;bar\, orig_to=\bar@&#8203;foo\, relay=127.0.0.1[127.0.0.1]​:1002

this means the .* at the start of the regexp will consume everything up to 'orig_'. $1 therefore matches bar@​foo\, the orig_to sub-expression doesn't match anything (and captures the empty string)\, and the relay matches what we expect.

You can re-write your regexp to drop the .* at the front entirely\, although then your .+ that's trying to match the 'to' address will be greedy.

I'd suggest you try​:

/   to=\< ([^>]+) >\,   (?​:\s orig_to=\< ([^>]+) >\,)?   \s relay=(.+) /x;

(/x switch used for clarity). The [^>]+ terms can consume parts of the e-mail addresses\, but can't get too greedy since they can't match a close angle-bracket. Other options may include using non-greedy quantifiers (.+? and .*?) or other techniques.

However unless I've misinterpreted your report\, this isn't a bug in Perl. It's just quantifier greediness doing its thing.

For further reading\, I suggest 'perldoc perlre' (search for 'greedy'). There's also some examples chapter 12 of Perl Training Australia's "Programming Perl" course manual at http​://perltraining.com.au/notes.html in the section on greediness. (Be aware that as a co-author of the manual\, I make have some positive bias towards it.)

All the very best\,

  Paul

-- Paul Fenwick \pjf@&#8203;perltraining\.com\.au | http​://perltraining.com.au/ Director of Training | Ph​: +61 3 9354 6001 Perl Training Australia | Fax​: +61 3 9354 2681

p5pRT commented 16 years ago

@pjf - Status changed from 'open' to 'resolved'

p5pRT commented 16 years ago

From @pjf

G'day p5p\,

Just want to do a quick sanity check. Andrea's recent RT ticket[1] provided an excellent distraction from work\, but since I've only just recently de-lurked in p5p I wanted to make sure I followed accepted protocols.

Put simply\, I​:

1) Took the ticket and opened it\, to indicate I was working on a response. 2) Answered the ticket (it was a misunderstanding with greediness). 3) Tagged it as 'notabug' and severity 'none'. 4) Closed the ticket.

I note that most tickets in rt.perl.org don't have an owner\, and I'm worried that I should have done more twiddling with the custom fields. I've also assumed that replies go back to the requestor (as with most RT setups).

If I have booched the ticket resolution process\, please accept my apologies and desire to be enlightened. If I haven't\, then I trust nobody minds me dealing with the occasional 'notabug' tickets here and there.

Many thanks\,

  Paul

[1] http​://rt.perl.org/rt3//Ticket/Display.html?id=51688

-- Paul Fenwick \pjf@&#8203;perltraining\.com\.au | http​://perltraining.com.au/ Director of Training | Ph​: +61 3 9354 6001 Perl Training Australia | Fax​: +61 3 9354 2681

p5pRT commented 16 years ago

From @abigail

On Thu\, Mar 13\, 2008 at 03​:33​:00AM -0700\, andrea @​ inversepath. com wrote​:

----------------------------------------------------------------- [Please enter your report here]

When using non matching (?​: ) regular expression operator with '?' or '*' conditionals the builtin @​- array and $1\, $2 matches are incorrectly set\, here's a test case​:

#!/usr/bin/perl

$string = 'BDC1B1406D​: to=\foo@&#8203;bar\, orig_to=\bar@&#8203;foo\, relay=127.0.0.1[127.0.0.1]​:1002';

$string =~ /.*to=\<(.+)>\,(?​: orig_to=\<(.+)>\,) relay=(.+)/;

print $1 . "\n"; print $2 . "\n"; print $3 . "\n";

# $1 = foo@​bar\, $2 = bar@​foo\, $3 = 127.0.0.1[127.0.0.1]​:1002

$string =~ /.*to=\<(.+)>\,(?​: orig_to=\<(.+)>\,)? relay=(.+)/;

print $1 . "\n"; print $2 . "\n"; print $3 . "\n";

# $1 = bar@​foo\, $2 = undef\, $3 = 127.0.0.1[127.0.0.1]​:1002

Clearly in this last case $1 and $2 are wrong\, using (?​: )* does the same.

Why do you think the latter case is wrong? Note that in the latter case\, due to the leading .*\, 'to=' is matched as part of 'orig_to='.

This doesn't look like a consistent or expected behaviour to me.

To me\, it does.

Abigail

p5pRT commented 16 years ago

From @abarisani

Thanks Paul!

I've actually realized this myself and emailed a follow-up to the ticket\, but it seems it never landed on RT\, maybe the email got lost.

Anyway thanks for the clarification\, I'm getting old and I missed a whitespace in the regexp ;).

Cheers

p5pRT commented 16 years ago

From @abarisani

Ok\, my mistake. The regular expression is incorrect and perl actually behaves correctly.

.*to= opposed to .* to=

Sorry for the waste of time :)\, it was subtle though.

Cheers.

p5pRT commented 16 years ago

From @abarisani

Thanks for the follow up guys\, I realized to that it was because of a missing space between .*to (should be .* to). I'm getting old ;).

I actually replied to my own report right after stating that it's invalid but emails never got through for some reason. So trying one more email.

Sorry for the spam and thanks for the patience.

p5pRT commented 16 years ago

From @nwc10

On Thu\, Mar 13\, 2008 at 11​:43​:19PM +1100\, Paul Fenwick wrote​:

1) Took the ticket and opened it\, to indicate I was working on a response. 2) Answered the ticket (it was a misunderstanding with greediness). 3) Tagged it as 'notabug' and severity 'none'. 4) Closed the ticket.

I note that most tickets in rt.perl.org don't have an owner\, and I'm worried that I should have done more twiddling with the custom fields.
I've also assumed that replies go back to the requestor (as with most RT setups).

This concerns me too. But if I spent my (spare) time and sanity doing that\, I wouldn't be doing other things\, which I infer I'm better able to do than most other people. (eg swearing at nasty XS code)

If I have booched the ticket resolution process\, please accept my apologies and desire to be enlightened. If I haven't\, then I trust nobody minds me dealing with the occasional 'notabug' tickets here and there.

There isn't really a formal process. I doubt that any process has been written down before.

If you have the time and patience to occasionally clean up notabug tickets it would be most welcome.

(or cleaning up anything else that that you feel confident dealing with\, such as tickets where patches have been applied of the correspondence recorded implies resolution\, but the bug isn't actually marked as closed)

Nicholas Clark