badboy / rdb-rs

RDB parsing & dumping library and utility
https://rdb.fnordig.de/
MIT License
62 stars 17 forks source link

Iteration style #10

Open badboy opened 8 years ago

badboy commented 8 years ago
jwilm commented 7 years ago

Any plans to try and land this? I'm interested in using the parser as a component in a memory analysis tool. I'd also be happy to contribute analysis tools to this project if there's interest.

In any case, the Iterator API would make it possible to build such tools.

WRT implementation questions from the original issue and top comment in this PR:

We can't emit full values, they could be huge (large lists, sets, ...) Nested iterators would fix that, I need to play around with that to see if it is useful

Nested iterators would be a great solution here. I imagine they should have a Drop impl so the parser advances even if they are unused.

Skipping mechanism

Isn't that just Iterator::next()? Again, drop impl on nested iterators would be helpful to ensure correctness.

Actually, now that I think of it, the Iterator API isn't sufficient to enforce this. The iterator consumer could theoretically hold onto nested iterators and corrupt the parser state. The main iterator would need an API like

trait StreamingIterator {
    type Item<'a>;
    // Applying the lifetime parameter `'a` to `Self::Item` inside the trait.
    fn next<'a>(&'a self) -> Option<Self::Item<'a>>;
}

which is only made possible with ATCs (above code sample stolen from relevant RFC). If you're ok with not actually implementing Iterator, the signature of StreamingIterator could be used to implement a next() method which does enforce that any nested iterator would have to be consumed before continuing with the main iterator.

badboy commented 7 years ago

I did not have the time or motivation to bring this into a usable state. I'd still like to finish it someday, but probably not before the summer.

Any help or additional analysis is much appreciated!

Nested iterators would be a great solution here. I imagine they should have a Drop impl so the parser advances even if they are unused.

Yup, something like that.

Isn't that just Iterator::next()? Again, drop impl on nested iterators would be helpful to ensure correctness.

Yes and no. If we know beforehand what we want to skip, we can avoid parsing nested structures (for example we know the length of zipmaps without a need to parse inner values).

Actually, now that I think of it, the Iterator API isn't sufficient to enforce this.

That's a limitation of the in-built iterator interface indeed. Don't I already use a streaming iterator here? I don't know the code anymore.

jwilm commented 7 years ago

That's a limitation of the in-built iterator interface indeed. Don't I already use a streaming iterator here? I don't know the code anymore.

I took a look at the code, and it's implementing std::iter::Iterator which is necessarily not a streaming iterator.

You're probably aware of all following, but I want to be unambiguous in our discussion.

To actually implement a streaming iterator today, you can't use a trait. Something like this would work (lifetimes specified for clarity):

impl<R: Read, F: Filter> RdbParser<R,F> {
    fn next<'a>(&'a mut self) -> Option<RdbIteratorResult<'a>>;
}

But since there's no Iterator impl, IntoIter can't be used which means we can't use the for-loop sugar or any of the provided iterator trait methods. Consuming it for our purposes doesn't look much different though:

while let Some(rdb_iterator_result) = parser.next() {
    // ..
}

And now the compiler enforces that rdb_iterator_result is dropped before calling parser.next() again.

Does this sound reasonable? It's a bit of a trade, but it makes the nested iterators much easier to implement.

I did not have the time or motivation to bring this into a usable state. I'd still like to finish it someday, but probably not before the summer.

No pressure! I'm a bit limited on time as well, but at the very least I can contribute to discussion here.

It may be that, come April, I could contribute to the iterator impl. In that case, it would be very good have consensus on the implementation strategy before trying to pick up where you left off. I'm not going to make any promises about that because I have a tendancy to overcommit myself on OSS work. It may be that I can get some hours at work to push this forward, though.

badboy commented 7 years ago

I took a look at the code, and it's implementing std::iter::Iterator which is necessarily not a streaming iterator. Oh well, I likely had my reasons.

I'd have to think more about the best approach for this and thus can't give a definitive answer which approach would be best. But if you find the time to think about it I'm happy to discuss it further