dashbitco / flow

Computational parallel flows on top of GenStage
https://hexdocs.pm/flow
1.55k stars 90 forks source link

Unexpected parallelism with Flow.from_enumerables #62

Closed lackac closed 6 years ago

lackac commented 6 years ago

Looking through the code I found a theoretical issue or at least a potential source of confusion.

Let's look at an example:

enums_with_urls
|> Flow.from_enumerables(stages: 4, max_demand: 1)
|> Flow.map(&fetch_url/1)
|> Flow.run()

After looking at this code one would expect to have 4 processes that do the job of fetching urls from a particular API. That might be important because of throttling or just to prevent flooding a service.

However, if the input list contains more than 4 enumerables Flow will trust the job of the mapper operations to the generated GenStage.Streamer processes. Let's imagine that the input is generated from a directory listing where we have 100 files, each mapped to a Stream with File.stream!/3. Now we have 100 processes each fetching pages concurrently from some service that is suddenly getting more heat than expected. :)

I think we should either highlight this case in the documentation of Flow.from_enumberables/2 or remove this clause from the case expression.

josevalim commented 6 years ago

Good point, let's remove the clause.