Add `window` and `window_2` to `gleam/iterator`

l-x commented 2 months ago

Adds the equivalents of list.window and list.window_by_2 to the gleam/iterator package.

lpil commented 2 months ago

Hello! Could you detail what use cases you had for this?

l-x commented 2 months ago

Hello too!

Admittedly, my personal use cases are limited to solving mathematical puzzles. :D

When working with infinite iterators, I kept running into the problem of needing the next n items, which ultimately led to iterator.window/2. I only included iterator.window_by_2/1 in the PR for the sake of completeness.

lpil commented 4 weeks ago

Let's close this for now. This module is being deprecated anyway

chuckwondo commented 4 weeks ago

Let's close this for now. This module is being deprecated anyway

Interesting. Why would you want to deprecate the iterator module? Do you have something even better planned?

lpil commented 4 weeks ago

Because it's almost exclusively misused. No replacement is planned in the stdlib but the existing code will be published as a package.

chuckwondo commented 4 weeks ago

Would you mind elaborating on the misuse? Does the deprecation mean that there will be no way to have lazy/infinite sequences?

giacomocavalieri commented 4 weeks ago

Would you mind elaborating on the misuse?

Iterators introduce overhead over regular lists, so they are useful when working with collections that are too large to fit in memory where a plain list wouldn't be a good fit. Most misuses stem from thinking that iterators are a "zero cost abstraction" (my gut feeling is that the name being the same as the one used by Rust might be one of the reasons for this). Some patterns I noticed:

Turning a list into an iterator and back
```
my_list
|> iterator.from_list
|> ... some iterators processing ...
|> iterator.fold(...) // or `iterator.to_list`
```
The list already fits in memory, turning it into an iterator has no advantage here and will only introduce additional overhead, it would be much faster to just list.fold or list.map directly
A variation on the first one, using iterators as a way to use multiple map/filter so that you don't have to traverse the list multiple times. So this:
```
my_list
|> list.map(f1)
|> list.map(f2)
```
Would be turned into this:
```
my_list
|> iterator.from_list
|> iterator.map(f1)
|> iterator.map(f2)
|> iterator.to_list
```
The overhead introduced by iterators will almost always offset any possible gain obtained by not traversing the list multiple times. If traversing the list multiple times really poses a performance problem, then one should combine the mapping functions in a single list.map or use list.fold:
```
my_list
|> list.map(fn(x) { x |> f1 |> f2 })
```

Does the deprecation mean that there will be no way to have lazy/infinite sequences?

No, it will still be published as a standalone package (probably under gleam_community) but it will no longer be in stdlib so it's not as easy to reach for and misuse as it is right now :)

chuckwondo commented 4 weeks ago

@giacomocavalieri, thank you for your thorough clarification.

I'm going to play devil's advocate here and say that I doubt that making iterator "not as easy to reach for" will do little to nothing to prevent such misuse.

When iterator is finally removed from stdlib and placed somewhere else as a standalone package, I am willing to bet that rather than seeing reduced misuse, we will instead see a mix of inefficient use of list (like your double map example -- which is perhaps at least better than misuse of iterator) and an increase in importing the standalone iterator package (because people will ask what happened to iterator in stdlib and the answer will be that it's in a standalone package, so people will simply use the standalone package as the "fix").

Rather than deprecating iterator in stdlib, I would advocate for adding thorough documentation along the lines of the explanation you gave above. I believe that educating users on the potential for misuse, and how to "properly" do things (including how to make better use of list, such as avoiding the double use of map, as you have shown), would be more effective than simply making it harder to reach for.

By simply moving the package elsewhere, you're not explaining why you felt the need to do so. Therefore, people will simply find wherever you moved it to and pull it in from elsewhere so they can continue to unknowingly misuse it because they never found a thorough explanation of misuse and better use to help them write better code.

Another consideration would be to simply deprecate iterator.from_list, but I don't know how realistic that would be.

lpil commented 4 weeks ago

Making things harder to reach for by removing them from stdlib has worked all the other times we have done it, so I don't have any reason to think it would not work this time.

chuckwondo commented 4 weeks ago

I'd still advocate for adding documentation somewhere, along the lines of what @giacomocavalieri wrote above. Perhaps adding information to the soon-to-be-external iterator package about potential misuse, as well as some documentation to the list module about better practices, such as @giacomocavalieri's double map example above.

gleam-lang / stdlib

Add `window` and `window_2` to `gleam/iterator` #692