Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.98k stars 559 forks source link

m//g is infinite loop when applied to tied array element #8735

Closed p5pRT closed 14 years ago

p5pRT commented 17 years ago

Migrated from rt.perl.org#41216 (status was 'rejected')

Searchable as RT41216$

p5pRT commented 17 years ago

From @mjdominus

Created by @mjdominus

#!/usr/bin/perl

sub A​::TIEARRAY { bless {} => "A" } sub A​::FETCH { "this is his face" }

use Test​::More tests => 1;

{ my $COUNT = 0;   tie @​A\, "A";   $COUNT++ while $A[1] =~ m/i/g && $COUNT \< 4;   is($COUNT\, 3\, "there are only 3 i's in the test string"); }

__END__ The m/i/g loop should execute three times. But when $A[1] is an element of a tied array\, it repeats forever\, matching at the same point in the target string over and over.

Perl Info ``` Flags: category=core severity=medium Site configuration information for perl v5.8.0: Configured by mjd at Thu Apr 17 11:57:37 EDT 2003. Summary of my perl5 (revision 5.0 version 8 subversion 0) configuration: Platform: osname=linux, osvers=2.4.2-2, archname=i586-linux uname='linux plover.com 2.4.2-2 #1 sun apr 8 19:37:14 edt 2001 i586 unknown ' config_args='-des' hint=previous, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm', optimize='-O2', cppflags='-fno-strict-aliasing -I/usr/local/include -I/usr/include/gdbm -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm' ccversion='', gccversion='2.96 20000731 (Red Hat Linux 7.1 2.96-81)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -lndbm -lgdbm -ldl -lm -lc -lcrypt -lutil perllibs=-lnsl -ldl -lm -lc -lcrypt -lutil libc=/lib/libc-2.2.2.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.2.4' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic' cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib' Locally applied patches: @INC for perl v5.8.0: /usr/local/lib/perl5/5.8.0/i586-linux /usr/local/lib/perl5/5.8.0 /usr/local/lib/perl5/site_perl/5.8.0/i586-linux /usr/local/lib/perl5/site_perl/5.8.0 /usr/local/lib/perl5/site_perl/5.7.3 /usr/local/lib/perl5/site_perl/5.7.2 /usr/local/lib/perl5/site_perl/5.6.1 /usr/local/lib/perl5/site_perl/5.6.0 /usr/local/lib/perl5/site_perl . Environment for perl v5.8.0: HOME=/home/mjd LANG=C LANGUAGE (unset) LD_LIBRARY_PATH=/lib:/usr/lib:/usr/X11R6/lib LOGDIR (unset) PATH=/home/mjd/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/games:/sbin:/usr/sbin:/usr/local/bin/X11R6:/usr/local/bin/mh:/data/mysql/bin:/usr/local/bin/pbm:/usr/local/bin/ezmlm:/home/mjd/TPI/bin:/usr/local/teTeX/bin:/usr/local/mysql/bin PERL_BADLANG (unset) SHELL=/bin/bash ```
p5pRT commented 17 years ago

From @mjdominus

#!/usr/bin/perl

sub A​::TIEARRAY { bless {} => "A" }

I might add\, by the way\, that the behavior is different for a tied scalar. I found this rather surprising.

p5pRT commented 17 years ago

From @demerphq

On 1/9/07\, Mark Jason Dominus \mjd@&#8203;plover\.com wrote​:

#!/usr/bin/perl

sub A​::TIEARRAY { bless {} => "A" }

I might add\, by the way\, that the behavior is different for a tied scalar. I found this rather surprising.

Id bet its because we store the match position in the SV. So with a tied scalar we can update the SV and things work out. Actually I bet you could construct a tied scalar that would demonstrate other weirdness\, such as when the string changes size in between fetches.

Anyway\, with the tied array the magic comes from the array\, so a temporary sv is created to hold the value\, which of course is thrown away after the regex engine updates the pos on the string. So then the while repeats and the tie is refetched and it returns a new temp wich of course has no pos magic attached\, and the loop goes infinite.

Its not clear to me that this is a bug\, as you could encounter this problem in almost any sitatuation where you loop based on the contents of a tied value.

I guess it is fixable\, but im thinking the price to do so is too high.

cheers\, Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 17 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 17 years ago

From @mjdominus

Id bet its because we store the match position in the SV. So with a tied scalar we can update the SV and things work out.

Oh\, of course. Thanks.

Its not clear to me that this is a bug\,

Me neither\, but the tied scalar behavior seemed to confuse the issue somewhat\, so I thought I would report it and see what folks thought.

I guess it is fixable\, but im thinking the price to do so is too high.

My thinking is that even if it were fixed\, you'd still get very bizarre behavior in cases where the value changed between calls. Or perhaps more to the point\, behavior that would almost never be useful.

But indulge me in some speculation. Suppose the value of $A[1] were the *actual* SV returned by FETCH\, rather than a copy of it. This might not be too hard to implement. Then FETCH could control the behavior of m//g on the values it returned. For example\, this would work​:

  my $val = "i like pie";   sub A​::FETCH { $val }   $COUNT++ while $A[1] =~ m/i/g;   is($COUNT\, 3);

because FETCH is returning the same SV each time\, whereas this would behave differently​:

  sub A​::FETCH { my $val = "i like pie"; $val }   $COUNT++ while $A[1] =~ m/i/g; # INFINITE LOOP   is($COUNT\, 3); # NOT REACHED

because FETCH is discarding the old SV and making a new one each time.

This might have unforeseen consequences\, but then again it might not.

p5pRT commented 17 years ago

From @demerphq

On 1/9/07\, Mark Jason Dominus \mjd@&#8203;plover\.com wrote​:

Id bet its because we store the match position in the SV. So with a tied scalar we can update the SV and things work out.

Oh\, of course. Thanks.

Its not clear to me that this is a bug\,

Me neither\, but the tied scalar behavior seemed to confuse the issue somewhat\, so I thought I would report it and see what folks thought.

I guess it is fixable\, but im thinking the price to do so is too high.

My thinking is that even if it were fixed\, you'd still get very bizarre behavior in cases where the value changed between calls. Or perhaps more to the point\, behavior that would almost never be useful.

I was just thinking you could store the offset for each index in the array. Which would at least make the array case behave as the scalar case would.

But yeah I imagine it wouldnt be at all difficult to contrive a tie that would do something odd.

But indulge me in some speculation. Suppose the value of $A[1] were the *actual* SV returned by FETCH\, rather than a copy of it. This might not be too hard to implement. Then FETCH could control the behavior of m//g on the values it returned. For example\, this would work​:

    my $val = "i like pie";
    sub A&#8203;::FETCH \{ $val \}
    $COUNT\+\+ while $A\[1\] =~ m/i/g;
    is\($COUNT\, 3\);

because FETCH is returning the same SV each time\, whereas this would behave differently​:

    sub A&#8203;::FETCH \{ my $val = "i like pie"; $val \}
    $COUNT\+\+ while $A\[1\] =~ m/i/g;  \# INFINITE LOOP
    is\($COUNT\, 3\);  \# NOT REACHED

because FETCH is discarding the old SV and making a new one each time.

This might have unforeseen consequences\, but then again it might not.

You know\, had I guessed I would have thought that the sub A​::FETCH { $val } version would pass the test for exactly the reason you state. Maybe that aspect is a bug.

cheers\, Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 17 years ago

From @mjdominus

My thinking is that even if it were fixed\, you'd still get very bizarre behavior in cases where the value changed between calls. Or perhaps more to the point\, behavior that would almost never be useful.

I was just thinking you could store the offset for each index in the array. Which would at least make the array case behave as the scalar case would.

I think that would make the problem worse\, not better. Because then consider this​:

  my $COUNTER = 0;   sub A​::FETCH {   return ("bar"\, "baz")[$COUNTER++];   }

  $MATCHES++ while $A[0] =~ m/b/g;   is($MATCHES\, 2);

With your suggestion\, the offset is associated with A[0]. After the "b" in "bar" is matched\, the offset is 1. The second time the loop runs\, the match of "bar" =~ /b/ fails\, because matching starts at the "a".

The only reasonable place to store the match offset is in the SV itself; anything else will drastically change the behavior of all sorts of things.

You know\, had I guessed I would have thought that the sub A​::FETCH { $val } version would pass the test for exactly the reason you state. Maybe that aspect is a bug.

This is just the sort of reason why I wanted to discuss it. I wasn't sure that it was a bug\, but I wasn't sure that it wasn't either.

p5pRT commented 17 years ago

From @demerphq

On 1/9/07\, Mark Jason Dominus \mjd@&#8203;plover\.com wrote​:

My thinking is that even if it were fixed\, you'd still get very bizarre behavior in cases where the value changed between calls. Or perhaps more to the point\, behavior that would almost never be useful.

I was just thinking you could store the offset for each index in the array. Which would at least make the array case behave as the scalar case would.

I think that would make the problem worse\, not better. Because then consider this​:

    my $COUNTER = 0;
    sub A&#8203;::FETCH \{
      return \("bar"\, "baz"\)\[$COUNTER\+\+\];
    \}

    $MATCHES\+\+ while $A\[0\] =~ m/b/g;
    is\($MATCHES\, 2\);

With your suggestion\, the offset is associated with A[0]. After the "b" in "bar" is matched\, the offset is 1. The second time the loop runs\, the match of "bar" =~ /b/ fails\, because matching starts at the "a".

Yes\, as i said\, youd get wierdness similar to what would happen with a scalar. I should think that a tied scalar with that fetch routine would do exactly as you describe.

Also its worth noting that using list context does not suffer from the problem since only one tied fetch occurs. The issue is the while (//g)​:

$COUNT=()=$A[1] =~ m/i/g;

I guess the moral of the story is that the only truely safe while (//g) usage is on an untied scalar.

The only reasonable place to store the match offset is in the SV itself; anything else will drastically change the behavior of all sorts of things.

I lean towards thinking the impact of storing additional pos data for tied compositie structures wouldnt be so great\, but its academic considering its almost certainly not going to happen.

You know\, had I guessed I would have thought that the sub A​::FETCH { $val } version would pass the test for exactly the reason you state. Maybe that aspect is a bug.

This is just the sort of reason why I wanted to discuss it. I wasn't sure that it was a bug\, but I wasn't sure that it wasn't either.

Well as I said\, the behaviour you describe for the above case is what I personally would consider least surprise.

Cheers\, Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 17 years ago

From @davidnicol

/g is concerned with position.

  In scalar context\, each execution of "m//g" finds the next   match\, returning true if it matches\, and false if there is no   further match. The position after the last match can be read   or set using the pos() function; see "pos" in perlfunc. A   failed match normally resets the search position to the begin-   ning of the string\, but you can avoid that by adding the "/c"   modifier (e.g. "m//gc"). Modifying the target string also   resets the search position.

  You can intermix "m//g" matches with "m/\G.../g"\, where "\G" is   a zero-width assertion that matches the exact position where   the previous "m//g"\, if any\, left off. Without the "/g" modi-   fier\, the "\G" assertion still anchors at pos()\, but the match   is of course only attempted once.

so it would be reasonable to expect

    my $COUNTER = 0;
    sub A&#8203;::FETCH \{
      return \("bar"\, "baz"\)\[$COUNTER\+\+\];
    \}

    $MATCHES\+\+ while $A\[0\] =~ m/b/g;
    is\($MATCHES\, 2\);

to fail. is($MATCHES\,1) would be expected to succeed.

I now know from this thread that pos is maintained with a SV. Maybe a POS extension to the TIE protocol is required\, which would be an lvalue subroutine and would default to\, as demerphq suggests\, associating with seen keys? A hash on the keys -- so it will work with sparse arrays\, so it will work with TIEHASH without modification as presumably this problem exists there too.

p5pRT commented 17 years ago

From blgl@hagernas.com

In article \rt\-3\.6\.HEAD\-1814\-1168347202\-759\.41216\-75\-0@&#8203;perl\.org\, perlbug-followup@​perl.org (Mark-Jason Dominus) wrote​:

#!/usr/bin/perl

sub A​::TIEARRAY { bless {} => "A" } sub A​::FETCH { "this is his face" }

use Test​::More tests => 1;

{ my $COUNT = 0; tie @​A\, "A"; $COUNT++ while $A[1] =~ m/i/g && $COUNT \< 4; is($COUNT\, 3\, "there are only 3 i's in the test string"); }

__END__ The m/i/g loop should execute three times. But when $A[1] is an element of a tied array\, it repeats forever\, matching at the same point in the target string over and over.

How about this workaround?

#!/usr/bin/perl

sub A​::TIEARRAY { bless {} => "A" } sub A​::FETCH { "this is his face" }

use Test​::More tests => 1;

{ my $COUNT = 0;   tie @​A\, "A";   for my $ELEMENT ($A[1]) {   $COUNT++ while $ELEMENT =~ m/i/g && $COUNT \< 4;   }   is($COUNT\, 3\, "there are only 3 i's in the test string"); } __END__

$ELEMENT maintains its own search position. Changing the array element through $ELEMENT undefs that position. Changing the array element through some other path doesn't.

/Bo Lindbergh

p5pRT commented 17 years ago

From blgl@hagernas.com

In article \934f64a20701091435n2f6cb838q867b9a03adc00ed2@&#8203;mail\.gmail\.com\, davidnicol@​gmail.com ("David Nicol") wrote​:

Maybe a POS extension to the TIE protocol is required\, which would be an lvalue subroutine and would default to\, as demerphq suggests\, associating with seen keys? A hash on the keys -- so it will work with sparse arrays\, so it will work with TIEHASH without modification as presumably this problem exists there too.

If you're going to extend tie\, the right way would be to add a) a REF method for arrays and hashes that returns a reference to a   (possibly tied) scalar representing the requested element and b) POS_FETCH and POS_STORE methods for scalars.

/Bo Lindbergh

p5pRT commented 14 years ago

From @iabyn

Having thought about this topic for a bit\, I've decided to mark this ticket as rejected/wontfix. I can't see that there's any way to make this "work" without a large overhead in complexity\, for behaviour that could be argued either way. In particular\, it is right to expect FETCH to be called multiple times\, and given that its possible for FETCH to return a completely different value each time\, it would be odd to expect pos to preserved between calls. Providing some additional mechanism to allow the tie package to indicate that it wishes to handle pos\, seems like overkill.

Note that it is not practicable for the result of FETCH to be used directly as the result of $A[1]. For a start\, sub calls normally return a *copy* of the return values\, unless declared with :lvalue. Even if that was worked round\, the current low-level behaviout is that av_fetch on tied array returns a proxy magical SV\, who's get/set magic is responsibly for later calling FETCH/STORE. We would have to change the behaviour such that av_fetch immediately calls FETCH\, then attached tiedelem magic to it so that any further access to the returned value did the right thing. This would mean that get magic is being called earlier\, and in circumstances where it otherwise might not. And that the thing that FETCH returns no has tiedelem magic attached to it\, which might well break whatever FETCH does internally.

Note that a tied scalar value preserving its pos could be construed as a bug (it happens because this time there's no proxy SV\, but the assignment of the new value to it happens while magic is temporarily disabled\, so pos doesn't get updated.

p5pRT commented 14 years ago

@iabyn - Status changed from 'open' to 'rejected'