Closed p5pRT closed 11 years ago
when m// is used in list context
i.e. my @matches = $string=~m/pattern/;
the value returned by pos($string) is incorrect.
here's a code sample with and without list context.
my $hstr1 = "fee fie foe foo"; $hstr1 =~ m/e/mcg; print "position is ".pos($hstr1)."\n";
my $hstr2 = "fee fie foe foo"; my @matches = $hstr2 =~ m/e/mcg; print "position is ".pos($hstr2)."\n";
The output from this script is:
position is 2 Use of uninitialized value in concatenation (.) at .//junk.pl line 11. position is
This isn't documented anywhere that I could find in "Programming Perl" under the pos() function description\, or under the section of pattern matching.
(ie undefined\, even if //gc was specified)
Patch below over @11692 saves pos for //gc even in list context. I'm not entirely sure that we want to do this\, but I see no harm beyond a chunk of code getting duplicated in pp_hot - perhaps we should pull both copies of that chunk out into a variant of mg.c:Perl_magic_setpos.
Hugo
I think we don't want to do this\, because I don't think that having pos() set by m//gc in list context is useful\, and even if it were it's certainly not backwards compatible.
Ronald
No\, C\<@a = /pat/> matches only once\, but returns all the captured parens from that match. C\<@a = /pat/g> matches globally\, and returns the captured parens from all those matches.
All of which has little to nothing to do with the additional effects of a //gc modifier\, which I thought you were talking about (and which you used in your original example code).
Hugo
I don't think we've ever defined what effect //gc would have in list context\, I suspect primarily because we never thought about it. If anyone has ever used it\, I think this is the effect they would have intended - I can't see why anyone would use it instead of //g expecting it to do what it currently does (ie the same as //g). Similarly I don't see any benefit to having the flag be valid but ignored in this case.
The downside\, I think\, is that it would more commonly be useful to be able to capture all parens from a single match and still have the //gc effects on pos. However that would not to my mind be the 'obvious' meaning of C\<@a = /pat/gc>\, so if we want to support that as an option I feel we should find a new flag combination to permit it.
I can certainly see use for the semantics implemented by my patch\, particularly with a \G anchor when parsing text that includes lists\, like: push @words\, /\G(\w+)(?:\,\s*)?/gc; # now parse what is after the list
Hugo
I am not attached either way.
I did spend a number of hours debugging a lexer that I wrote\, only to find that my code was fine\, but my assumptions about m// were wrong.
there is no intuitive link between pos() and using m//g in list context.
if backward compatibility is a concern\, then at least document pos() as only working in scalar context. perldoc is not clear on this point either.
Intuitively\, pos() should return where \G points to in the string. Does this mean that \G does not move when you do m//g in list context? This would be really weird\, and it would be undocumented as well.
the /g modifier in list context is redundant. since an array on the left side would indicate "give me all the matches".
so perhaps\, to maintain backward compatibility\, perl could simply emit a warning if it sees /g being used when m// is used in list context.
This is still non-intuitive\, but at least I would see the warning\, do some digging\, and hopefully find out perl's limitation before I spend hours debugging otherwise working code.
warning: /g modifier does not apply to m// in list context.
Then I'd at least know something is amiss.
The real problem is that pos() moving should almost be a modifier of its own. m//p will update the position\, m// will not. oh well.
I found a work-around for my lexer\, so I am not attached. but it was annoying to spend all that time debugging what turned out to be an undocumented "feature" of perl.
your right. my bad. scratch that paragraph from the record. I retract it.
All of which has little to nothing to do with the additional effects of a //gc modifier\, which I thought you were talking about (and which you used in your original example code).
yes\, the /c modifier. the short of it is that this works as expected:
my $str = 'name abc name xyz name qrs non matching text'; my @matches; while($str =~ m/name (\w+)/gc) { push(@matches\,$1); } print "pos is ".pos($str)."\n";
and this code is not effectively the same thing:
my $str = 'name abc name xyz name qrs non matching text'; my @matches = $str =~ m/name (\w+)/gc; print "pos is ".pos($str)."\n";
Both give me three matches\, but the second has an undefined pos(). It seems to me that they should effectively be the same thing. But that's just my opinion. maybe there are other nuances that are implied when doing list context that I'm not intimate with.
Do with this information what you wish. I found a work-around in my code. I was just reporting what looks like an inconsistency/bug.
Greg London
This was fixed in 0af80b6034a\, aka change #11696.
--
Father Chrysostomos
This was fixed in 0af80b6034a\, aka change #11696.
--
Father Chrysostomos
@cpansprout - Status changed from 'open' to 'resolved'
Migrated from rt.perl.org#7526 (status was 'resolved')
Searchable as RT7526$