evalEmpire / perl5i

A single module to fix as much of Perl 5 as possible in one go
http://search.cpan.org/perldoc?perl5i
Other
156 stars 42 forks source link

each() can go bad if the callback calls keys or values #210

Closed schwern closed 11 years ago

schwern commented 13 years ago

%hash->each can go into an infinite loop if keys or values is called on %hash inside the callback.

use perl5i::2;
use Test::More;

my %hash = (
    foo => 23,
    bar => 42,
    baz => 99
);

my $i = 0;
note sprintf "This should run %d times", scalar keys %hash;
%hash->each(func($k,$v) {
    pass();
    keys %hash;
    die "We're in an infinite loop" if $i++ > 10;
});

done_testing(3);

The only solution I can see is to have our each to do its own hash iteration. This likely requires XS. Ideally it would be its own CPAN module.

schwern commented 12 years ago

It also goes bad if you nest %hash->each inside %hash->each.

exodist commented 11 years ago

This shouldn't need to be hard. I am assuming that the reason ti goes bad is because it uses http://perldoc.perl.org/functions/each.html under the hood which uses an iterator that is part of the hash, and re-uses the same iterator anywhere it is called.

However a pure perl implementation of each() can solve this problem easily:

  sub each(...) {
    my ( $hashref, $callback ) = @_;
    my @keys = keys @$hashref;
    for my $key ( @keys ) {
      $callback->( $key, $hashref->{$key} );
    }
  }

Whatever magic currently attaches each() to the hash could attach that pure perl snippet instead of the builtin and easily solve the problem.

There are of course a lot of assumptions to my comment, I have no idea how autobox actually works, or how hard it would be to put this snippet into place. I also don't know what performance difference there is in the PP solution verse the built-in iterator.

schwern commented 11 years ago

@exodist You're right that your code would be easy to implement, autobox means you're basically writing a method and what you wrote would basically work. See perl5i::2::HASH for example.

The main feature of each is it does not make a copy of all the keys in the hash, avoiding consuming an extra O(N) memory. If our each makes a copy, it defeats the point.

exodist commented 11 years ago

http://cpansearch.perl.org/src/AMS/Storable-2.30/Storable.xs

grep the code for 'itera' seems they store the iterator, do their thing, then restore the old value. This would be a good example for writing XS extensions that let you get (as a scalar) and set (as a scalar) the iterator value of a hash. With these methods on a hash one could use this:

sub each() {
  my ( $hash, $callback ) = @_;
  my $old_iter = $hash->iterator();
  my $i = 0;
  while ( my $key = $hash->iter_advance( \$i )) {
    $callback->( $key, $hash->{$key} );
  }
}

iter_advance would be code that sets the iterator to the value in the ref passed in, advances it, sets the iterator to the next value, then returns the key for the iterator that was passed in. Since $i tracks the iterator and stores the value after each step, state is not corrupted by other uses of keys and each.

There are still a lot of unknowns here, it is very hand-wavy... but hopefully it is a good launch point for someone that wants to grab this. If I have any free time in the near future I may try,never done XS before.

exodist commented 11 years ago

we might be able to make it safer simply by having the each, keys, values, and as_list methods all store and restore the iterator value after each use, then it is safe to nest them all, so long as you do not combine them with non-autoboxed versions.

schwern commented 11 years ago

Good sleuthing! Progress!

Here's the magic @exodist referenced from Storable.xs...

/*
 * Save possible iteration state via each() on that table.
 */
riter = HvRITER_get(hv);
eiter = HvEITER_get(hv);
hv_iterinit(hv);

/* Restore hash iterator state */
HvRITER_set(hv, riter);
HvEITER_set(hv, eiter);

I checked hv.h and those macros are available in 5.10.0.

I like the idea of wrapping hv_iter_save() and hv_iter_restore() around that XS code and saving and restoring. This avoids having to write all of each in XS and, as @exodist pointed out, makes it available for other methods and should work with nested hash iteration method calls.

The caveat about this not mixing with keys/values is unfortunate, but already true. Its a shame there's not a way to work with the iterator directly.

exodist commented 11 years ago

The questions is, do we want:

$val = hash->iter_get() and hash->iter_set($val); and hash->iter_init;

or do we want: $hash->iter_push $hash->iter_pop

where push stores the current iterator in a stack internal to the hash, re-initializes it, then pop restores the last value from the stack?

The benefit of the first one is that you have more control, and if something happens in code down from yours you do not have to worry about mismatched push and pop. The benefit of the second is you don't have to worry about the iterator value stored in the scalar getting corrupted (that is assuming the iterator value can work as a scalar at all, looks like a 32-bit unsigned integer, but I am guessing it stores a hash value, not an index, if someone tried to use ++ on the value what would happen, would it be valid?

schwern commented 11 years ago

I think we want the latter. It's safer, easier to manage and associates the iterator stack with the hash.

Fortunately, we have a mechanism to store meta data on anything without putting it in that thing. So we don't need to store the iterator stack in the hash. This is currently used to store signatures with code references. Long story short, we're using the field hash mechanism for inside out objects.

If @exodist can take care of writing the XS functions to get and set a hash iterator, I can take care of the autobox and meta data magic. On Mar 5, 2013 12:29 PM, "Chad Granum" notifications@github.com wrote:

The questions is, do we want:

$val = hash->iter_get() and hash->iter_set($val); and hash->iter_init;

or do we want: $hash->iter_push $hash->iter_pop

where push stores the current iterator in a stack internal to the hash, re-initializes it, then pop restores the last value from the stack?

The benefit of the first one is that you have more control, and if something happens in code down from yours you do not have to worry about mismatched push and pop. The benefit of the second is you don't have to worry about the iterator value stored in the scalar getting corrupted (that is assuming the iterator value can work as a scalar at all, looks like a 32-bit unsigned integer, but I am guessing it stores a hash value, not an index, if someone tried to use ++ on the value what would happen, would it be valid?

— Reply to this email directly or view it on GitHubhttps://github.com/schwern/perl5i/issues/210#issuecomment-14463264 .

exodist commented 11 years ago

It is doubtful I will have time to do this before the weekend, and I have never touched xs before, but I will see what I can do.

exodist commented 11 years ago

I played around with this last night to great success. Tonight (after work) I will finish it into a library I will put on cpan that does what we need for this, then you can pop it into perl5i's autoboxing.

exodist commented 11 years ago

each() is fixed in 6a744ea2c14 however it is dependent on Hash::StoredIterator which I just now uploaded to cpan, it might take a bit for it to filter in. You can download it here in the meantime: https://github.com/exodist/Hash-StoredIterator

exodist commented 11 years ago

I am not closing the ticket because:

exodist commented 11 years ago

To be clear, here are my docs of the edge case:

sort() edge case For some reason "[sort hkeys %hash]" and "[sort hkeys(%hash)]" both result in a list that has all the keys and values (and strangly not in sorted order). However "[sort(hkeys(%hash))]" works fine.

schwern commented 11 years ago

Great work! Looking forward to nipping this problem once and for all.

exodist commented 11 years ago

It suddenly occured to me why the edge case occurs. sort doesn't just take a block as a special use-case, it can also be sort FUNC_NAME items... or sort FUNC_NAME (items), so the special case is trying to use hkeys as the sort function, and %hash as items. The parser does not do this for sort keys %hash because keys is a keyword, not a function. I don't think my edge case is a bug to be fixed (not sure it could be fixed either)

I will close this ticket because perl5i doesn't suffer from this edge case since it uses each as a method.