Closed p5pRT closed 21 years ago
Assigning a /gx regexp to a list breaks \G in the following regexp.
#! /usr/local/bin/perl -w
$_ = 'a 1 b 2 c 3';
print "bug\n"; ($a\, $b) = /^(\w)\s(\d)\s/gx; print "a=$a b=$b\n"; ($a\, $b) = /\G(\w)\s(\d)/gx; print "a=$a b=$b\n";
print "workaround\n"; /^(\w)\s(\d)\s/gx; ($a\, $b) = ($1\, $2); print "a=$a b=$b\n"; ($a\, $b) = /\G(\w)\s(\d)/gx; print "a=$a b=$b\n";
Output is
bug a=a b=1 a=a b=1 workaround a=a b=1 a=b b=2
Here's the configuration of my perl at home\, but the one at work\, 5.00502\, also exihibits this.
Ralph.
Site configuration information for perl 5.00404:
Configured by root at Thu Sep 10 02:15:30 EDT 1998.
Summary of my perl5 (5.0 patchlevel 4 subversion 4) configuration: Platform: osname=linux\, osvers=2.0.34\, archname=i386-linux uname='linux porky.redhat.com 2.0.34 #1 thu may 7 10:17:44 edt 1998 i686 unknown ' hint=recommended\, useposix=true\, d_sigaction=define bincompat3=y useperlio=undef d_sfio=undef Compiler: cc='cc'\, optimize='-O2'\, gccversion=2.7.2.3 cppflags='-Dbool=char -DHAS_BOOL -I/usr/local/include' ccflags ='-Dbool=char -DHAS_BOOL -I/usr/local/include' stdchar='char'\, d_stdstdio=define\, usevfork=false intsize=4\, longsize=4\, ptrsize=undef\, doublesize=undef alignbytes=4\, usemymalloc=n\, prototype=define Linker and Libraries: ld='cc'\, ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lc -lposix -lcrypt libc=\, so=so useshrplib=false\, libperl=libperl.a Dynamic Linking: dlsrc=dl_dlopen.xs\, dlext=so\, d_dlsymun=undef\, ccdlflags='-rdynamic' cccdlflags='-fpic'\, lddlflags='-shared -L/usr/local/lib'
On Tue\, Nov 30\, 1999 at 04:10:20PM +0000\, Ralph Corderoy wrote:
Hi\,
Assigning a /gx regexp to a list breaks \G in the following regexp.
\#\! /usr/local/bin/perl \-w $\_ = 'a 1 b 2 c 3'; print "bug\\n"; \($a\, $b\) = /^\(\\w\)\\s\(\\d\)\\s/gx; print "a=$a b=$b\\n"; \($a\, $b\) = /\\G\(\\w\)\\s\(\\d\)/gx; print "a=$a b=$b\\n"; print "workaround\\n"; /^\(\\w\)\\s\(\\d\)\\s/gx; \($a\, $b\) = \($1\, $2\); print "a=$a b=$b\\n"; \($a\, $b\) = /\\G\(\\w\)\\s\(\\d\)/gx; print "a=$a b=$b\\n";
Output is
bug a=a b=1 a=a b=1 workaround a=a b=1 a=b b=2
This is the expected behavior for m//g in a list context. The regular expression is applied repeatedly until it no longer matches\, and the return value is a list of all the substrings matched. After the final application of the regex\, which fails to match\, pos() is reset to the beginning of the string.
Try this instead:
#! /usr/local/bin/perl -w
$_ = 'a 1 b 2 c 3';
@a = /\G(\w)\s(\d)\s?/gx; $" = '\,'; print "@a\n";
Output:
a\,1\,b\,2\,c\,3
Ronald
Assigning a /gx regexp to a list breaks \G in the following regexp.
\#\! /usr/local/bin/perl \-w $\_ = 'a 1 b 2 c 3'; print "bug\\n"; \($a\, $b\) = /^\(\\w\)\\s\(\\d\)\\s/gx; print "a=$a b=$b\\n"; \($a\, $b\) = /\\G\(\\w\)\\s\(\\d\)/gx; print "a=$a b=$b\\n"; print "workaround\\n"; /^\(\\w\)\\s\(\\d\)\\s/gx; \($a\, $b\) = \($1\, $2\); print "a=$a b=$b\\n"; \($a\, $b\) = /\\G\(\\w\)\\s\(\\d\)/gx; print "a=$a b=$b\\n";
Output is
bug a=a b=1 a=a b=1 workaround a=a b=1 a=b b=2
Sound correct behaviour to me. From perlop:
/PATTERN/cgimosx [...] Options are:
c Do not reset search position on a failed match when /g is in effect. ----> g Match globally\, i.e.\, find all occurrences. i Do case-insensitive pattern matching. m Treat string as multiple lines. o Compile pattern only once. s Treat string as single line. x Use extended regular expressions.
It just appears that your list is too small to hold the rest of the matches. And as the match is finished\, \G is resetted. In a scalar context\, the 'g' modifier work differently\, as it doesn't eat up all matches. Your "workaroud" is the way to do what you want to.
Hope it helps\,
François Désarménien
Ronald J Kimball writes:
This is the expected behavior for m//g in a list context. The regular expression is applied repeatedly until it no longer matches\, and the return value is a list of all the substrings matched. After the final application of the regex\, which fails to match\, pos() is reset to the beginning of the string.
I remember some discussion for making list context m//gc behave differently.
What was the result?
Ilya
"Ilya" == Ilya Zakharevich \ilya@​math\.ohio\-state\.edu writes:
Ilya> I remember some discussion for making list context m//gc behave differently.
Ilya> What was the result?
If I recall (since it was me that brought it up)\, nothing. :(
Here's the thread\, where I use the term "thread" loosely (since it is a single message :)...
http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-10/msg01047.html
I know\, *patches welcome*. :)
-- Randal L. Schwartz - Stonehenge Consulting Services\, Inc. - +1 503 777 0095 \merlyn@​stonehenge\.com \<URL:http://www.stonehenge.com/merlyn/> Perl/Unix/security consulting\, Technical writing\, Comedy\, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
Ronald J Kimball writes:
This is the expected behavior for m//g in a list context. The regular expression is applied repeatedly until it no longer matches\, and the return value is a list of all the substrings matched. After the final application of the regex\, which fails to match\, pos() is reset to the beginning of the string.
Hi\, thanks for all the replies. Having put -Dr to good use I fully understand what is happening with my example script. However\, I'd like to put a case forward that this behaviour is broken and judging from Randal's question in
http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-10/msg01047.html
there might be others that agree.
Normally\, the use of context in Perl to `do the right thing' works well. I'd suggest that the overloading of /g based on context allows only two alternatives (/g and no /g) to be selected from a larger desired set of operations.
If I want to match many times\, possibly leaving pos alone on failure\, then the existing /g and /c work well.
@w = /(\w+)/g; @w = /(\w+)/gc;
But if I want to do some lexing then I need /g to enable \G. As long as I use scalar context that's fine.
if (/(\w)+/gc) { ($foo) = ($1);
but if I want to assign to a list I'm stuck; that would create a list context indicating I want multiple matches.
if (($foo) = /(\w)+/gc) { # bad
I want to give a list context for assignment to variables and I need \G to work. With the context overloading of /g that's impossible.
If there was a /l for lexing that enabled \G without adding the global matching of /g then I'd be happy and I suspect a lot of other people would stop making the same mistake and bothering you about it.
if (($foo) = /(\w)+/lc) { # bad
Making /gc work so pos wasn't reset on the last\, guaranteed to fail\, iteration wouldn't cut it AFAICS. It would still give me multiple matches which aren't desired.
($foo) = /(\w+)\s*/lc; # takes one word. ($foo) = /(\w+)\s*/gc; # takes many words returning one.
To sum up\, /g means two things (many matches and enable \G)\, it means you can't just enable \G.
Thanks for your time.
Ralph.
Hi\, I'm sending this again as I suspect that sending it from a different to normal mail address yesterday might have made the list drop me on the floor. Sorry for the noise if this isn't the case.
------- Forwarded Message
Date: Wed\, 01 Dec 1999 17:04:01 +0000 Subject: Re: [ID 19991130.003] Assignment to List Breaks \G. In-reply-to: Message from Ilya Zakharevich \ilya@​math\.ohio\-state\.edu "of Tue\, 30 Nov 1999 11:43:00 EST." \199911301643\.LAA12574@​monk\.mps\.ohio\-state\.edu To: Ilya Zakharevich \ilya@​math\.ohio\-state\.edu Cc: rjk@linguist.dartmouth.edu (Ronald J Kimball)\, merlyn@stonehenge.com\, desar@front1m.grolier.fr\, ralph@inputplus.demon.co.uk (Ralph Corderoy)\, perl5-porters@perl.org Message-id: \199912011704\.RAA105502@​cm01\.ess Content-transfer-encoding: 7BIT
Ronald J Kimball writes:
This is the expected behavior for m//g in a list context. The regular expression is applied repeatedly until it no longer matches\, and the return value is a list of all the substrings matched. After the final application of the regex\, which fails to match\, pos() is reset to the beginning of the string.
Hi\, thanks for all the replies. Having put -Dr to good use I fully understand what is happening with my example script. However\, I'd like to put a case forward that this behaviour is broken and judging from Randal's question in
http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-10/msg01047.html
there might be others that agree.
Normally\, the use of context in Perl to `do the right thing' works well. I'd suggest that the overloading of /g based on context allows only two alternatives (/g and no /g) to be selected from a larger desired set of operations.
If I want to match many times\, possibly leaving pos alone on failure\, then the existing /g and /c work well.
@w = /(\w+)/g; @w = /(\w+)/gc;
But if I want to do some lexing then I need /g to enable \G. As long as I use scalar context that's fine.
if (/(\w)+/gc) { ($foo) = ($1);
but if I want to assign to a list I'm stuck; that would create a list context indicating I want multiple matches.
if (($foo) = /(\w)+/gc) { # bad
I want to give a list context for assignment to variables and I need \G to work. With the context overloading of /g that's impossible.
If there was a /l for lexing that enabled \G without adding the global matching of /g then I'd be happy and I suspect a lot of other people would stop making the same mistake and bothering you about it.
if (($foo) = /(\w)+/lc) { # bad
Making /gc work so pos wasn't reset on the last\, guaranteed to fail\, iteration wouldn't cut it AFAICS. It would still give me multiple matches which aren't desired.
($foo) = /(\w+)\s*/lc; # takes one word. ($foo) = /(\w+)\s*/gc; # takes many words returning one.
To sum up\, /g means two things (many matches and enable \G)\, it means you can't just enable \G.
Thanks for your time.
Ralph.
------- End of Forwarded Message
Migrated from rt.perl.org#1838 (status was 'resolved')
Searchable as RT1838$