gleam-lang / stdlib

🎁 Gleam's standard library
https://hexdocs.pm/gleam_stdlib/
Apache License 2.0
492 stars 175 forks source link

Have iterator.yield not wait for the next to be available before yielding the previous #624

Closed Hackder closed 5 months ago

Hackder commented 5 months ago

I ran into this while implementing out-of-order streaming in gleam. Consider this code example:

import gleam/io
import gleam/iterator
import gleam/erlang/process

pub fn main() {
  iter()
  |> iterator.take_while(fn(n) { n < 2 })
  |> iterator.each(io.debug)
}

fn expensive_thing(n) {
  process.sleep(1000)
  n
}

fn iter() {
  let n = expensive_thing(1)
  io.debug("adding 1")
  use <- iterator.yield(n)

  let n = expensive_thing(2)
  io.debug("adding 2")
  use <- iterator.yield(n)

  let n = expensive_thing(3)
  io.debug("adding 3")
  use <- iterator.yield(n)

  let n = expensive_thing(4)
  io.debug("adding 4")
  use <- iterator.yield(n)

  io.debug("adding end")
  iterator.empty()
}

The problem in general:

We are taking while n < 2 therefore we should produce 1, then 2, see that 2 is not less than 2 and stop producing. The code should take 2 seconds to run. But this is not the case. Due to the implementation, an iterator won't produce a value until the next one is available. Therefore, it will take 3 seconds to run.

This is the problem I ran into:

In mist (the webserver) you can specify a "chunked" response, which will respond in chunks provided by the passed iterator. When implementing http-streaming, you want to send the first page as quickly as possible and follow up with some additional data when it comes from the database a little bit later. This is all done during a single http get request.

Due to the behavior described above, I had to add two items to the iterator, each time I wanted to produce a new html chunk. The first is the chunk, and the second one some small dummy data. In my case <!----> (html comment). It cannot be empty due to how chunked responses work.

Other issues that may arrise

Generating combination/cartesian-product of length n. If I have an alphabet of small english letters (26). cartesian product of length 3 would be 26^3 items. As you can see this is growing exponentially. Ideally, we would like to avoid generating the next length if we don't have to.

At minimum I think this is a pretty big foot-gun. (it took me a while to debug while my streaming wasn't working properly). And it doesn't seem to be documented anywhere. The issue comes from this function:

pub fn yield(element: a, next: fn() -> Iterator(a)) -> Iterator(a) {
  Iterator(fn() { Continue(element, next().continuation) })
}

The element cannot be accessed without invoking the next function.

Workaround

I found one workaround. That is to make the iterator be an iterator of functions. Therefore, producing the next "producer" function is cheap, and the computation cost is transferred to the consumer. This is the example from above rewriten.

import gleam/io
import gleam/iterator
import gleam/erlang/process

pub fn main() {
  iter()
  |> iterator.map(fn(producer) { producer() })
  |> iterator.take_while(fn(n) { n < 2 })
  |> iterator.each(io.debug)
}

fn expensive_thing(n) {
  process.sleep(1000)
  n
}

fn iter() {
  let compute = fn() {
    let n = expensive_thing(1)
    io.debug("adding 1")
    n
  }
  use <- iterator.yield(compute)

  let compute = fn() {
    let n = expensive_thing(2)
    io.debug("adding 2")
    n
  }
  use <- iterator.yield(compute)

  let compute = fn() {
    let n = expensive_thing(3)
    io.debug("adding 3")
    n
  }
  use <- iterator.yield(compute)

  let compute = fn() {
    let n = expensive_thing(4)
    io.debug("adding 4")
    n
  }
  use <- iterator.yield(compute)

  io.debug("adding end")
  iterator.empty()
}

Possible solutions

Unfortunately, I don't have any good ideas for a solution. My specific problem could probably be fixed on the mist side, by implementing some response type which can take messages from a subject as an input.

I considered reimplementing the iterator in stdlib. It is an opaque type so the API wouldn't change, but it still would be a breaking change, because the behavior changed.

Anyways, I would welcome some feedback on this. Maybe I just missed something and the behavior I desire is possible without any trickery.

lpil commented 5 months ago

Let's make iterator.yield immediately yield the element.

Hackder commented 5 months ago

Sounds good to me. I'd be happy to draft a PR for this if you'd like.

lpil commented 5 months ago

Yes please! Thank you!