Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.93k stars 552 forks source link

pos() doesn't work when using =~m// in list context #4320

Closed p5pRT closed 11 years ago

p5pRT commented 23 years ago

Migrated from rt.perl.org#7526 (status was 'resolved')

Searchable as RT7526$

p5pRT commented 23 years ago

From greglondon@oaktech.com

when m// is used in list context

  i.e. my @​matches = $string=~m/pattern/;

the value returned by pos($string) is incorrect.

here's a code sample with and without list context.

  my $hstr1 = "fee fie foe foo";   $hstr1 =~ m/e/mcg;   print "position is ".pos($hstr1)."\n";

  my $hstr2 = "fee fie foe foo";   my @​matches = $hstr2 =~ m/e/mcg;   print "position is ".pos($hstr2)."\n";

The output from this script is​:

  position is 2   Use of uninitialized value in concatenation (.) at .//junk.pl line 11.   position is

This isn't documented anywhere that I could find in "Programming Perl" under the pos() function description\, or under the section of pattern matching.

Perl Info ``` Flags: category=core severity=low Site configuration information for perl v5.6.0: Configured by cote at Mon Jan 8 09:26:45 EST 2001. Summary of my perl5 (revision 5.0 version 6 subversion 0) configuration: Platform: osname=solaris, osvers=2.7, archname=sun4-solaris uname='sunos beast 5.7 generic_106541-11 sun4u sparc sunw,ultra-5_10 ' config_args='-Dcc=gcc -Dprefix=/opt/perl -de' hint=recommended, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=undef d_sfio=undef uselargefiles=define use64bitint=undef use64bitall=undef uselongdouble=undef usesocks=undef Compiler: cc='gcc', optimize='-O', gccversion=2.95.2 19991024 (release) cppflags='-fno-strict-aliasing -I/usr/local/include' ccflags ='-fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64' stdchar='char', d_stdstdio=define, usevfork=false intsize=4, longsize=4, ptrsize=4, doublesize=8 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, usemymalloc=y, prototype=define Linker and Libraries: ld='gcc', ldflags =' -L/usr/local/lib ' libpth=/usr/local/lib /lib /usr/lib /usr/ccs/lib libs=-lsocket -lnsl -ldl -lm -lc -lcrypt -lsec libc=/lib/libc.so, so=so, useshrplib=false, libperl=libperl.a Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' ' cccdlflags='-fPIC', lddlflags='-G -L/usr/local/lib' Locally applied patches: @INC for perl v5.6.0: /tools/pm/bin /opt/perl/lib/5.6.0/sun4-solaris /opt/perl/lib/5.6.0 /opt/perl/lib/site_perl/5.6.0/sun4-solaris /opt/perl/lib/site_perl/5.6.0 /opt/perl/lib/site_perl . Environment for perl v5.6.0: HOME=/home/london LANG=C LANGUAGE (unset) LD_LIBRARY_PATH=/usr/openwin/lib:/usr/dt/lib:/tools/affirma/ldv3.1/tools/lib:/usr/openwin/lib:/usr/local/lib:/usr/openwin/share/include/X11:/usr/lib LOGDIR (unset) PATH=/home/london/pm/bin:/tools/debussy/bin:/tools/affirma/ldv3.1/tools/dfII/bin:/tools/affirma/ldv3.1/tools/bin:/tools/affirma/ldv3.1/tools/inca/bin:/tools/lsf/4.1/sparc-sol7-64/etc:/tools/lsf/4.1/sparc-sol7-64/bin:/tools/vcs/5.1/bin:/tools/vcs/5.1/sun_sparc_solaris_5.5.1/bin:/tools/vcs/5.1/sun_sparc_solaris_5.5.1/util:/tools/vcs/5.1/sun_sparc_solaris_5.5.1/flexlm:/tools/transeda/vn6.0.1/bin/SunOS5:/tools/synopsys/2000.11-SP1/sparcOS5/syn/bin:/tools/synopsys/2000.11-SP1/sparc64/syn/bin:/tools/synopsys/2000.11-SP1:/tools/synopsys/scl/sparcOS5/bin:/tools/synopsys/2000.11-SP1/sparc64/syn/bin:./:/home/london/bin:/home/london/synmake:/usr/local/bin:/bin:/usr/bin:/usr/sbin:/usr/ucb:/etc:/usr/openwin/bin:/usr/X/bin:/usr/openwin/bin/xview:/usr/openwin/bin:/usr/openwin/demo:/usr/ccs/bin:/usr/dt/bin:/usr/java/bin:/tools/pm/bin:/opt/nEtscape:/usr/elroy:/tools/screener/bin:/tools/pm/bin:/home/london/Office51/bin PERLLIB=/tools/pm/bin PERL_BADLANG (unset) SHELL=/bin/csh ```
p5pRT commented 23 years ago

From @vanstyn

(ie undefined\, even if //gc was specified)

Patch below over @​11692 saves pos for //gc even in list context. I'm not entirely sure that we want to do this\, but I see no harm beyond a chunk of code getting duplicated in pp_hot - perhaps we should pull both copies of that chunk out into a variant of mg.c​:Perl_magic_setpos.

Hugo

Inline Patch ```diff --- pp_hot.c.old Sun Aug 12 15:45:09 2001 +++ pp_hot.c Thu Aug 16 18:44:42 2001 @@ -1336,6 +1336,22 @@ } } if (global) { + if (pm->op_pmflags & PMf_CONTINUE) { + MAGIC* mg = 0; + if (SvTYPE(TARG) >= SVt_PVMG && SvMAGIC(TARG)) + mg = mg_find(TARG, PERL_MAGIC_regex_global); + if (!mg) { + sv_magic(TARG, (SV*)0, PERL_MAGIC_regex_global, Nullch, 0); + mg = mg_find(TARG, PERL_MAGIC_regex_global); + } + if (rx->startp[0] != -1) { + mg->mg_len = rx->endp[0]; + if (rx->startp[0] == rx->endp[0]) + mg->mg_flags |= MGf_MINMATCH; + else + mg->mg_flags &= ~MGf_MINMATCH; + } + } had_zerolen = (rx->startp[0] != -1 && rx->startp[0] == rx->endp[0]); PUTBACK; /* EVAL blocks may use stack */ --- t/op/pat.t.old Mon Aug 13 01:00:53 2001 +++ t/op/pat.t Thu Aug 16 18:40:34 2001 @@ -6,7 +6,7 @@ $| = 1; -print "1..683\n"; +print "1..684\n"; BEGIN { chdir 't' if -d 't'; @@ -1979,3 +1979,11 @@ @a = ("foo\n\x{100}bar" =~ /\C/gs); print "ok 683\n" if @a == 9 && "@a" eq "f o o \n $a $b b a r"; +{ + # [ID 20010814.004] pos() doesn't work when using =~m// in list context + $_ = "ababacadaea"; + $a = join ":", /b./gc; + $b = join ":", /a./gc; + $c = pos; + print "$a $b $c" eq 'ba:ba ad:ae 10' ? "ok 684\n" : "not ok 684\t# $a $b $c\n"; +} ```
p5pRT commented 23 years ago

From @tamias

I think we don't want to do this\, because I don't think that having pos() set by m//gc in list context is useful\, and even if it were it's certainly not backwards compatible.

Ronald

p5pRT commented 23 years ago

From @vanstyn

No\, C\<@​a = /pat/> matches only once\, but returns all the captured parens from that match. C\<@​a = /pat/g> matches globally\, and returns the captured parens from all those matches.

All of which has little to nothing to do with the additional effects of a //gc modifier\, which I thought you were talking about (and which you used in your original example code).

Hugo

p5pRT commented 23 years ago

From @vanstyn

I don't think we've ever defined what effect //gc would have in list context\, I suspect primarily because we never thought about it. If anyone has ever used it\, I think this is the effect they would have intended - I can't see why anyone would use it instead of //g expecting it to do what it currently does (ie the same as //g). Similarly I don't see any benefit to having the flag be valid but ignored in this case.

The downside\, I think\, is that it would more commonly be useful to be able to capture all parens from a single match and still have the //gc effects on pos. However that would not to my mind be the 'obvious' meaning of C\<@​a = /pat/gc>\, so if we want to support that as an option I feel we should find a new flag combination to permit it.

I can certainly see use for the semantics implemented by my patch\, particularly with a \G anchor when parsing text that includes lists\, like​:   push @​words\, /\G(\w+)(?​:\,\s*)?/gc;   # now parse what is after the list

Hugo

p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

I am not attached either way.

I did spend a number of hours debugging a lexer that I wrote\, only to find that my code was fine\, but my assumptions about m// were wrong.

there is no intuitive link between pos() and using m//g in list context.

if backward compatibility is a concern\, then at least document pos() as only working in scalar context. perldoc is not clear on this point either.

Intuitively\, pos() should return where \G points to in the string. Does this mean that \G does not move when you do m//g in list context? This would be really weird\, and it would be undocumented as well.

the /g modifier in list context is redundant. since an array on the left side would indicate "give me all the matches".

so perhaps\, to maintain backward compatibility\, perl could simply emit a warning if it sees /g being used when m// is used in list context.

This is still non-intuitive\, but at least I would see the warning\, do some digging\, and hopefully find out perl's limitation before I spend hours debugging otherwise working code.

warning​: /g modifier does not apply to m// in list context.

Then I'd at least know something is amiss.

The real problem is that pos() moving should almost be a modifier of its own. m//p will update the position\, m// will not. oh well.

I found a work-around for my lexer\, so I am not attached. but it was annoying to spend all that time debugging what turned out to be an undocumented "feature" of perl.

p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

your right. my bad. scratch that paragraph from the record. I retract it.

All of which has little to nothing to do with the additional effects of a //gc modifier\, which I thought you were talking about (and which you used in your original example code).

yes\, the /c modifier. the short of it is that this works as expected​:

  my $str = 'name abc name xyz name qrs non matching text';   my @​matches;   while($str =~ m/name (\w+)/gc)   {   push(@​matches\,$1);   }   print "pos is ".pos($str)."\n";

and this code is not effectively the same thing​:

  my $str = 'name abc name xyz name qrs non matching text';   my @​matches = $str =~ m/name (\w+)/gc;   print "pos is ".pos($str)."\n";

Both give me three matches\, but the second has an undefined pos(). It seems to me that they should effectively be the same thing. But that's just my opinion. maybe there are other nuances that are implied when doing list context that I'm not intimate with.

Do with this information what you wish. I found a work-around in my code. I was just reporting what looks like an inconsistency/bug.

Greg London

p5pRT commented 11 years ago

From @cpansprout

This was fixed in 0af80b6034a\, aka change #11696.

--

Father Chrysostomos

p5pRT commented 11 years ago

From [Unknown Contact. See original ticket]

This was fixed in 0af80b6034a\, aka change #11696.

--

Father Chrysostomos

p5pRT commented 11 years ago

@cpansprout - Status changed from 'open' to 'resolved'