Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
2.14k stars 587 forks source link

Assignment to List Breaks \G. #903

Closed p5pRT closed 21 years ago

p5pRT commented 25 years ago

Migrated from rt.perl.org#1838 (status was 'resolved')

Searchable as RT1838$

p5pRT commented 25 years ago

From ralph@inputplus.demon.co.uk

Assigning a /gx regexp to a list breaks \G in the following regexp.

  #! /usr/local/bin/perl -w

  $_ = 'a 1 b 2 c 3';

  print "bug\n";   ($a\, $b) = /^(\w)\s(\d)\s/gx;   print "a=$a b=$b\n";   ($a\, $b) = /\G(\w)\s(\d)/gx;   print "a=$a b=$b\n";

  print "workaround\n";   /^(\w)\s(\d)\s/gx;   ($a\, $b) = ($1\, $2);   print "a=$a b=$b\n";   ($a\, $b) = /\G(\w)\s(\d)/gx;   print "a=$a b=$b\n";

Output is

  bug   a=a b=1   a=a b=1   workaround   a=a b=1   a=b b=2

Here's the configuration of my perl at home\, but the one at work\, 5.00502\, also exihibits this.

Ralph.

Site configuration information for perl 5.00404​:

Configured by root at Thu Sep 10 02​:15​:30 EDT 1998.

Summary of my perl5 (5.0 patchlevel 4 subversion 4) configuration​:   Platform​:   osname=linux\, osvers=2.0.34\, archname=i386-linux   uname='linux porky.redhat.com 2.0.34 #1 thu may 7 10​:17​:44 edt 1998 i686 unknown '   hint=recommended\, useposix=true\, d_sigaction=define   bincompat3=y useperlio=undef d_sfio=undef   Compiler​:   cc='cc'\, optimize='-O2'\, gccversion=2.7.2.3   cppflags='-Dbool=char -DHAS_BOOL -I/usr/local/include'   ccflags ='-Dbool=char -DHAS_BOOL -I/usr/local/include'   stdchar='char'\, d_stdstdio=define\, usevfork=false   intsize=4\, longsize=4\, ptrsize=undef\, doublesize=undef   alignbytes=4\, usemymalloc=n\, prototype=define   Linker and Libraries​:   ld='cc'\, ldflags =' -L/usr/local/lib'   libpth=/usr/local/lib /lib /usr/lib   libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lc -lposix -lcrypt   libc=\, so=so   useshrplib=false\, libperl=libperl.a   Dynamic Linking​:   dlsrc=dl_dlopen.xs\, dlext=so\, d_dlsymun=undef\, ccdlflags='-rdynamic'   cccdlflags='-fpic'\, lddlflags='-shared -L/usr/local/lib'

p5pRT commented 25 years ago

From @tamias

On Tue\, Nov 30\, 1999 at 04​:10​:20PM +0000\, Ralph Corderoy wrote​:

Hi\,

Assigning a /gx regexp to a list breaks \G in the following regexp.

\#\! /usr/local/bin/perl \-w

$\_ = 'a 1 b 2 c 3';

print "bug\\n";
\($a\, $b\) = /^\(\\w\)\\s\(\\d\)\\s/gx;
print "a=$a b=$b\\n";
\($a\, $b\) = /\\G\(\\w\)\\s\(\\d\)/gx;
print "a=$a b=$b\\n";

print "workaround\\n";
/^\(\\w\)\\s\(\\d\)\\s/gx;
\($a\, $b\) = \($1\, $2\);
print "a=$a b=$b\\n";
\($a\, $b\) = /\\G\(\\w\)\\s\(\\d\)/gx;
print "a=$a b=$b\\n";

Output is

bug
a=a b=1
a=a b=1
workaround
a=a b=1
a=b b=2

This is the expected behavior for m//g in a list context. The regular expression is applied repeatedly until it no longer matches\, and the return value is a list of all the substrings matched. After the final application of the regex\, which fails to match\, pos() is reset to the beginning of the string.

Try this instead​:

  #! /usr/local/bin/perl -w

  $_ = 'a 1 b 2 c 3';

  @​a = /\G(\w)\s(\d)\s?/gx;   $" = '\,';   print "@​a\n";

Output​:

  a\,1\,b\,2\,c\,3

Ronald

p5pRT commented 25 years ago

From [Unknown Contact. See original ticket]

Assigning a /gx regexp to a list breaks \G in the following regexp.

\#\! /usr/local/bin/perl \-w

$\_ = 'a 1 b 2 c 3';

print "bug\\n";
\($a\, $b\) = /^\(\\w\)\\s\(\\d\)\\s/gx;
print "a=$a b=$b\\n";
\($a\, $b\) = /\\G\(\\w\)\\s\(\\d\)/gx;
print "a=$a b=$b\\n";

print "workaround\\n";
/^\(\\w\)\\s\(\\d\)\\s/gx;
\($a\, $b\) = \($1\, $2\);
print "a=$a b=$b\\n";
\($a\, $b\) = /\\G\(\\w\)\\s\(\\d\)/gx;
print "a=$a b=$b\\n";

Output is

bug
a=a b=1
a=a b=1
workaround
a=a b=1
a=b b=2

Sound correct behaviour to me. From perlop​:

  /PATTERN/cgimosx   [...]   Options are​:

  c Do not reset search position on a failed match when /g is in effect. ----> g Match globally\, i.e.\, find all occurrences.   i Do case-insensitive pattern matching.   m Treat string as multiple lines.   o Compile pattern only once.   s Treat string as single line.   x Use extended regular expressions.

It just appears that your list is too small to hold the rest of the matches. And as the match is finished\, \G is resetted. In a scalar context\, the 'g' modifier work differently\, as it doesn't eat up all matches. Your "workaroud" is the way to do what you want to.

Hope it helps\,

François Désarménien

p5pRT commented 25 years ago

From [Unknown Contact. See original ticket]

Ronald J Kimball writes​:

This is the expected behavior for m//g in a list context. The regular expression is applied repeatedly until it no longer matches\, and the return value is a list of all the substrings matched. After the final application of the regex\, which fails to match\, pos() is reset to the beginning of the string.

I remember some discussion for making list context m//gc behave differently.

What was the result?

Ilya

p5pRT commented 25 years ago

From @RandalSchwartz

"Ilya" == Ilya Zakharevich \ilya@​math\.ohio\-state\.edu writes​:

Ilya> I remember some discussion for making list context m//gc behave differently.

Ilya> What was the result?

If I recall (since it was me that brought it up)\, nothing. :(

Here's the thread\, where I use the term "thread" loosely (since it is a single message :)...

http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-10/msg01047.html

I know\, *patches welcome*. :)

-- Randal L. Schwartz - Stonehenge Consulting Services\, Inc. - +1 503 777 0095 \merlyn@&#8203;stonehenge\.com \<URL​:http​://www.stonehenge.com/merlyn/> Perl/Unix/security consulting\, Technical writing\, Comedy\, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

p5pRT commented 25 years ago

From [Unknown Contact. See original ticket]

Ronald J Kimball writes​:

This is the expected behavior for m//g in a list context. The regular expression is applied repeatedly until it no longer matches\, and the return value is a list of all the substrings matched. After the final application of the regex\, which fails to match\, pos() is reset to the beginning of the string.

Hi\, thanks for all the replies. Having put -Dr to good use I fully understand what is happening with my example script. However\, I'd like to put a case forward that this behaviour is broken and judging from Randal's question in

  http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-10/msg01047.html

there might be others that agree.

Normally\, the use of context in Perl to `do the right thing' works well. I'd suggest that the overloading of /g based on context allows only two alternatives (/g and no /g) to be selected from a larger desired set of operations.

If I want to match many times\, possibly leaving pos alone on failure\, then the existing /g and /c work well.

  @​w = /(\w+)/g;   @​w = /(\w+)/gc;

But if I want to do some lexing then I need /g to enable \G. As long as I use scalar context that's fine.

  if (/(\w)+/gc) {   ($foo) = ($1);

but if I want to assign to a list I'm stuck; that would create a list context indicating I want multiple matches.

  if (($foo) = /(\w)+/gc) { # bad

I want to give a list context for assignment to variables and I need \G to work. With the context overloading of /g that's impossible.

If there was a /l for lexing that enabled \G without adding the global matching of /g then I'd be happy and I suspect a lot of other people would stop making the same mistake and bothering you about it.

  if (($foo) = /(\w)+/lc) { # bad

Making /gc work so pos wasn't reset on the last\, guaranteed to fail\, iteration wouldn't cut it AFAICS. It would still give me multiple matches which aren't desired.

  ($foo) = /(\w+)\s*/lc; # takes one word.   ($foo) = /(\w+)\s*/gc; # takes many words returning one.

To sum up\, /g means two things (many matches and enable \G)\, it means you can't just enable \G.

Thanks for your time.

Ralph.

p5pRT commented 25 years ago

From [Unknown Contact. See original ticket]

Hi\, I'm sending this again as I suspect that sending it from a different to normal mail address yesterday might have made the list drop me on the floor. Sorry for the noise if this isn't the case.

------- Forwarded Message

Date​: Wed\, 01 Dec 1999 17​:04​:01 +0000 Subject​: Re​: [ID 19991130.003] Assignment to List Breaks \G. In-reply-to​: Message from Ilya Zakharevich \ilya@&#8203;math\.ohio\-state\.edu "of Tue\, 30 Nov 1999 11​:43​:00 EST." \199911301643\.LAA12574@&#8203;monk\.mps\.ohio\-state\.edu To​: Ilya Zakharevich \ilya@&#8203;math\.ohio\-state\.edu Cc​: rjk@​linguist.dartmouth.edu (Ronald J Kimball)\, merlyn@​stonehenge.com\,   desar@​front1m.grolier.fr\, ralph@​inputplus.demon.co.uk (Ralph Corderoy)\,   perl5-porters@​perl.org Message-id​: \199912011704\.RAA105502@&#8203;cm01\.ess Content-transfer-encoding​: 7BIT

Ronald J Kimball writes​:

This is the expected behavior for m//g in a list context. The regular expression is applied repeatedly until it no longer matches\, and the return value is a list of all the substrings matched. After the final application of the regex\, which fails to match\, pos() is reset to the beginning of the string.

Hi\, thanks for all the replies. Having put -Dr to good use I fully understand what is happening with my example script. However\, I'd like to put a case forward that this behaviour is broken and judging from Randal's question in

  http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1999-10/msg01047.html

there might be others that agree.

Normally\, the use of context in Perl to `do the right thing' works well. I'd suggest that the overloading of /g based on context allows only two alternatives (/g and no /g) to be selected from a larger desired set of operations.

If I want to match many times\, possibly leaving pos alone on failure\, then the existing /g and /c work well.

  @​w = /(\w+)/g;   @​w = /(\w+)/gc;

But if I want to do some lexing then I need /g to enable \G. As long as I use scalar context that's fine.

  if (/(\w)+/gc) {   ($foo) = ($1);

but if I want to assign to a list I'm stuck; that would create a list context indicating I want multiple matches.

  if (($foo) = /(\w)+/gc) { # bad

I want to give a list context for assignment to variables and I need \G to work. With the context overloading of /g that's impossible.

If there was a /l for lexing that enabled \G without adding the global matching of /g then I'd be happy and I suspect a lot of other people would stop making the same mistake and bothering you about it.

  if (($foo) = /(\w)+/lc) { # bad

Making /gc work so pos wasn't reset on the last\, guaranteed to fail\, iteration wouldn't cut it AFAICS. It would still give me multiple matches which aren't desired.

  ($foo) = /(\w+)\s*/lc; # takes one word.   ($foo) = /(\w+)\s*/gc; # takes many words returning one.

To sum up\, /g means two things (many matches and enable \G)\, it means you can't just enable \G.

Thanks for your time.

Ralph.

------- End of Forwarded Message