MartinBernstorff / iterpy

⛓️ Iterators for Python with higher-order functions .map(), .filter() and .reduce(), also known as a fluent interface. As seen in Rust, Scala, Javascript etc.
Other
10 stars 0 forks source link

Support lazy iteration #131

Closed chuckwondo closed 7 months ago

chuckwondo commented 7 months ago

As currently implemented, Iter eagerly evaluates all iterators, so it does not support infinite iterators, per https://github.com/MartinBernstorff/iterpy/blob/main/iterpy/iter.py#L16. Even very large iterators could easily crush Iter. Even smaller iterators could cause problems, if you create a large enough chain of calls to Iter.map, Iter.filter, etc., as each call in the chain creates a new list, consuming more and more memory along the way.

Do you have any plans for implementing lazy evaluation to avoid these problems?

MartinBernstorff commented 7 months ago

Hi Chuck! Thanks for the interest in the project!

I've been pondering this for a while, with two considerations:

1) Implement a consumable Iter (e.g. CIter or similar), using generators through the entire pipeline. It should support the same methods, but be focused on performance. It won't be stateless, but would be memory efficient (Issue #61).

2) I don't know enough about Python's garbage collection to know whether it'll be able to collect part of a method chain. If it does, I think the issue is less severe than stated above.

Any thoughts on the above?

Do you have a use-case where you're running into issues? 😊

chuckwondo commented 7 months ago

Given that this is a library intended to provide a "fluent" interface for using iterators, I would say that by automatically eagerly evaluating things under the covers, you are automatically undoing one of the main reasons for using iterators to begin with, which is laziness.

Here's an example of a "competing" library to this one, where everything remains lazy, as anybody using iterators would reasonably expect to be the case: https://github.com/olirice/flupy/tree/master. I would imagine there are other such Python libraries around. Automatic eager evaluation is likely to be surprising to anybody specifically choosing to use an iterator. The builtin map, filter, zip, et al. functions are lazy.

Given your stated desire to have this library included as part of the rustedpy group of packages, I would imagine that you might wish to align this library with Rust's Iterator trait, which you will also find to be completely lazy, until you invoke a method known to trigger evaluation, such as collect. Of course, you wouldn't necessarily have to implement every single method defined by Rust's Iterator trait, but if the intent is to provide a Rust-like Iterator for Python, then Rust's Iterator trait should certainly be your guide.

MartinBernstorff commented 7 months ago

Thanks a ton for your interest, Chuck!

I just want to highlight that this is a hobby project, and as most open source contributors, I'm doing this for fun!

With that said, I completely agree with your technical points! I even mention flupy and std::iter in the readme 👍

I've implemented lazy, consumable evaluation as the default in #136.

MartinBernstorff commented 7 months ago

Just to add to this, I personally find non-consumable iterables much easier to work with for debugging. To that end, I've added an Arr[ay] in #139. Would love to hear what you think!