composewell / streaming-benchmarks

Benchmarks to compare Haskell streaming library performance
Other
102 stars 9 forks source link

Add composeDropOne benchmark. #12

Closed phischu closed 6 years ago

phischu commented 6 years ago

I added a benchmark that should demonstrate runtime quadratic in the number of consecutive drops in streamly. Since your Stream type is recursive and your drop function is recursive anyway, I do believe this is fixable, by repeatetly using uncons. To my surprise, vector has the same "problem" and I think it is on purpose to make drop non-recursive, to make it fuse.

harendra-kumar commented 6 years ago

That's a good find @phischu ! How did you come to think of it? The quadratic behavior is not shown by the continuation version of drop in streamly and neither by pure vector. There are a quite a few recursive functions including filter, but filter seems to be composing ok. Let me investigate a bit more on this. Perhaps we should have more compose benchmarks to see if there are any more problems lurking around.

harendra-kumar commented 6 years ago

@phischu bad news! take/takeWhile/drop/dropWhile/scan all show quadratic runtime when composed in both streamly and vector, they both have exactly the same type so I am not surprised that both of them have the same problem, the only difference is that streamly does not use the Skip constructor and relies on join point optimization. But surprisingly filter operation is not affected. I had found this same problem earlier in the vector append (cons) operation and could not find a way to fix it, it appeared to me that it is fundamentally unfixable for the vector type. I am not sure about these new discoveries yet but the fact that filter works ok gives some hope.

If we cannot fuse, another way could be to use rewrite rules to combine successive take/drop operations. In any case the fallback is the continuation version which composes linearly, but unfortunately it does not optimize well. How can we have the best of both worlds?

harendra-kumar commented 6 years ago

@phischu there was an issue with benchmarking, the composeDropOne function was not marked as INLINE. After fixing that I do not see a problem with take and scan. However, drop is still expensive (10 ms for 4 times composition) but it is not quadratic, it is linear in complexity. So the only problem seems to be that it does not fuse well. Let me see if that can be improved.

That's the problem with direct style, one missing INLINE and performance degradation shoots through the roof. I have seen that CPS is much less sensitive to inlining.