Open yoshuawuyts opened 4 years ago
However something to be mindful of is that calling next
in a loop can also be used to implement methods such as collect
. It seems there's an inherent mismatch between backpressure and driving futures.
Perhaps we can learn from futures-rs terminology here, and the max concurrency (#3) is not only the upper bound of futures executing simultaneously. But also the amount of futures that have (potentially) been completed, are now buffered, and are waiting to be read.
This is an interesting one it seems.
On the topic of ordering: some discussion occurred on reddit, but I think that we should:
ordered
which can enforce the stream is orderedfor_each
suffices. The main point of parallel streams is to increase throughput, and ordering is generally only useful when creating collections so they should probably be scoped to that.for_each
should not be ordered.
Currently parallel-stream will start doing work the second it's initialized. This is much the same as
task::spawn
. The upside of this is that we got it to work, and it's what people want in the overwhelming amount of cases.The downsides are that we're both spawning more tasks than needed, which interferes with the ability of the compiler to inline futures, which in turn will impact performance. Also there's no backpressure.
The solution to this seems to be to move "task spawning" to the edge methods:
next
,for_each
,collect
,sum
that on the one hand spawn tasks as fast as possible, while on the other hand allow outputting them one-by-one.There's probably some nuance here; for example
next
is by nature sequential so task spawning might not even make sense. But overall it seems that if we can invert the logic slightly this could lead to some neat results.