Open xkrt opened 9 years ago
How about AsyncSeq.mergeAll?
The following should work, but isn't ideal:
let parallelize (parallelism:int) (s:seq<Async<'a>>) : Async<seq<'a>> = async {
let buffer = new Collections.Concurrent.BlockingCollection<_>(parallelism)
let cts = new CancellationTokenSource()
try
do! Async.ParallelIgnore cts.Token parallelism (s |> Seq.map (Async.map buffer.Add))
finally
buffer.CompleteAdding()
return seq {
use buffer = buffer
use cts = cts
use _cancel = { new IDisposable with member __.Dispose() = cts.Cancel() }
yield! buffer.GetConsumingEnumerable(cts.Token) } }
This will run up to parallelism
computation in parallel, and will start emitting results as soon as they are available. One problem here is that the consumer of the AsyncSeq
has little control of the consumption of the source sequence. Ideally, we would have something that allowed the consumer of the resulting AsyncSeq
trigger the next batch to start, but still allow results to yield incrementally. This should possible to do by baking the Async.ParallelIgnore
loop directly into this function. Once parallelism
computations have completed, the outer AsyncSeq
stops consuming the source sequence until it is pulled again.
Implementation of Async.ParallelIgnore
.
static member ParallelIgnore (ct:CancellationToken) (parallelism:int) (xs:seq<Async<_>>) = async {
let sm = new SemaphoreSlim(parallelism)
let cde = new CountdownEvent(1)
let tcs = new TaskCompletionSource<unit>()
ct.Register(Action(fun () -> tcs.TrySetCanceled() |> ignore)) |> ignore
let inline tryComplete () =
if cde.Signal() then
tcs.SetResult(())
let inline ok _ =
sm.Release() |> ignore
tryComplete ()
let inline err (ex:exn) =
tcs.TrySetException ex |> ignore
sm.Release() |> ignore
let inline cnc (ex:OperationCanceledException) =
tcs.TrySetCanceled() |> ignore
sm.Release() |> ignore
try
use en = xs.GetEnumerator()
while not (tcs.Task.IsCompleted) && en.MoveNext() do
sm.Wait()
cde.AddCount(1)
Async.StartWithContinuations(en.Current, ok, err, cnc, ct)
tryComplete ()
do! tcs.Task |> Async.AwaitTask
finally
cde.Dispose()
sm.Dispose() }
Use of BlockingCollection
can be avoided with something like the following; it should allow waiting without blocking:
type private EagerSeq<'a> = TaskCompletionSource<EagerCons<'a>>
and private EagerCons<'a> =
| EagerCons of 'a * EagerSeq<'a>
| EagerNil
let rec private ofEager (e:EagerSeq<'a>) : AsyncSeq<'a> =
async.Delay <| fun () ->
e.Task
|> Async.AwaitTask
|> Async.map (function
| EagerNil -> Nil
| EagerCons(a,tl) -> Cons(a, ofEager tl))
Hey @xkrt, following up on this and seeing if your request can be addressed. There is a new function in master (not yet released): val mapAsyncParallel : f:('a -> Async<'b>) -> s:AsyncSeq<'a> -> AsyncSeq<'b>
. It is like the existing mapAsync
however it doesn't wait for a mapping operation to complete before starting the next. The results are placed into a FIFO queue, such that the results are returned in the same order as the inputs. Since order is maintained, head-of-line blocking can result - if the first item takes a long time to process, subsequent items won't be emitted even though they may already be processed. Would this work for your scenario?
Alternatively, we can get functions of this type:
val throttle : parallelism:int -> ('a -> Async<'b>) -> ('a -> Async<'b>)
val mapAsyncParallelUnordered:
f:('a -> Async<'b>) ->
AsyncSeq<'a> ->
AsyncSeq<'b>
This function would be similar to mapAsyncParallel
but will emit the first result which is available, regardless of order. Throttling can be applied using a different combinator. It could be built-in, but it seems to be a separate concern.
I needed a mapAsyncParallelUnordered
so I took the mapAsyncParallel
and modified it for use in an opensource tool. Just for thought I'd share since I came across this issue 4 years later hoping for said function.
Hi!
In my work I sometimes need a function that takes asyncs and perform them in parallel, but with simultaneously performed asyncs count lesser than count of overall asyncs passed to function, and return results of asyncs as AsyncSeq in order to consumer can process the results as soon as they appear.
Maybe my explanation confusing, I will try express it in the code:
Here is my attempt to do so:
There I forced to wait results of all asyncs in batch, but I want to recieve results as them appears.
In past I use MailboxProcessor's for this, one agent distrubutes tasks to a few worker agents.