fsprojects / FSharp.Control.AsyncSeq

Asynchronous sequences for F#
https://fsprojects.github.io/FSharp.Control.AsyncSeq/
Other
163 stars 59 forks source link

AsyncSeq.append called enormous number of times #50

Closed vasily-kirichenko closed 8 years ago

vasily-kirichenko commented 8 years ago

The following code is running for about 4 seconds, which is extremely slow:

let linesRead = ref 0

let lines (file: string) =
    asyncSeq {
        use reader = new StreamReader(file)
        while not reader.EndOfStream do
            let! line = reader.ReadLineAsync() |> Async.AwaitTask
            incr linesRead
            yield line
    }

let sw = Stopwatch.StartNew()
let res =
    lines @"e:\docs\big.txt"
    |> AsyncSeq.take 3000
    |> AsyncSeq.map (fun line -> line.Split([|' '|], StringSplitOptions.RemoveEmptyEntries).Length)
    |> AsyncSeq.sum
    |> Async.RunSynchronously
sw.Stop()
printfn "result = %d, %O, number of actully read line: %d" res sw.Elapsed !linesRead
Console.ReadKey() |> ignore
0 

The output is result = 10273, 00:00:04.2336427, number of actully read line: 3000

I profiled it and AsyncSeq.append was called about 13 million times (!):

image

eulerfx commented 8 years ago

FYI, looking at this in a branch: https://github.com/eulerfx/FSharp.Control.AsyncSeq/tree/perf ,will keep posted. Looking for opportunities to fuse appends, delays where possible.

vasily-kirichenko commented 8 years ago

I've tested on your branch, same results.

eulerfx commented 8 years ago

I should've been more clear - I haven't actually wired the optimizations to their proper place yet.

eulerfx commented 8 years ago

@vasily-kirichenko Try the branch now. Optimized append and bind from quadratic to linear via same optimization as in seq.fs.

vasily-kirichenko commented 8 years ago

Thanks, the bug has gone! I rerun the benchmark and modified the blog post accordingly, see https://vaskir.blogspot.ru/2016/05/akkanet-streams-vs-hopac.html. In short, AsyncSeq is still slower than both Hopac and Akka.NET.

dsyme commented 8 years ago

@vasily-kirichenko @eulerfx Thanks for iterating on this. It's an interesting benchmark and I think your comments seem pretty fair @vasily-kirichenko, at least as things stand today - I like the AsyncSeq model for its "naturalness" but you really have to profile if your perf is dominated by concurrency overheads.

Hopefully benchmarks like this will drive further iterations of improvements in all the frameworks.