Gabriella439 / pipes

Compositional pipelines
BSD 3-Clause "New" or "Revised" License
487 stars 72 forks source link

Quadratic performance on left-associated binds #234

Open tomjaguarpaw opened 2 years ago

tomjaguarpaw commented 2 years ago

The problem

Consider walking over a tree using a streaming library to do something at every leaf:

data Tree = Branch Tree Tree | Leaf Int

walkTreeTransformer :: (Monad (m IO), MonadTrans m)
                    => (Int -> IO ()) -> Tree -> m IO ()
walkTreeTransformer doSomething = loop
  where loop = \case
          Leaf i -> lift (doSomething i)
          Branch t1 t2 -> do
            loop t1
            loop t2

On left-skewed trees, i.e. trees that will end up generating left-associated binds, for example those generated by leftSkewed

-- Fully evaluated left-skewed tree
leftSkewed :: Int -> Tree
leftSkewed 0 = Leaf 0
leftSkewed n = (Branch $! leftSkewed (n - 1)) (Leaf n)

the performance of your streaming library is quadratic. In fact performance is quadratic under

(Neither streamly nor conduit have quadratic performance)

The plot below demonstrates the claim. The code to generate the plot is available at https://github.com/tomjaguarpaw/streaming-benchmark. The compiler was GHC 8.10.7.

image

The solution

Firstly, before talking about a solution, do you actually consider this a bug? I do, and I think the risk of hitting unexpected quadratic performance makes this library unsuitable for production use as it is. On the other hand maybe you have a good reason that I shouldn't think that. If so I would be interested to hear it.

If you do think this is a bug then let's think about what can be done. I know of two straightforward options:

  1. Add an explicit constructor for binds and then reassociate binds during elimination
  2. Use Codensity (streamly essentially uses handwritten Codensity.)

In my benchmarks explicit bind comes out slightly ahead. (The examples are implemented to match streaming but the same techniques ought to apply to your library.)

Perhaps there is another option more suitable for your library.

I welcome your thoughts.

(Related to https://github.com/Gabriel439/Haskell-Pipes-Library/issues/131)

tomjaguarpaw commented 2 years ago

[N.B. conduit doesn't have quadratic performance. I made a mistake when setting up the benchmarks: https://github.com/tomjaguarpaw/streaming-benchmark/commit/2a965a790c23c759c70ac580b25c59fd22ae922c]

mitchellwrosen commented 2 years ago

For what it's worth, this is mentioned in the tutorial: https://hackage.haskell.org/package/pipes-4.3.16/docs/Pipes-Tutorial.html#g:10

Here's an old ticket about it: https://github.com/Gabriel439/Haskell-Pipes-Library/issues/100

tomjaguarpaw commented 2 years ago

Thanks for pointing that out. My comments:

Gabriella439 commented 2 years ago

I don't have plans to fix this myself, but I'd accept a pull request if it didn't lead to a significant performance regression for other benchmarks

tomjaguarpaw commented 2 years ago

I don't have plans to fix this myself, but I'd accept a pull request if it didn't lead to a significant performance regression for other benchmarks

Which sort of change would you accept? Previously you said you would prefer not to CPS the internal representation. The alternative is to add a Bind constructor plus carefully written run (and perhaps carefully written other things).

Gabriella439 commented 2 years ago

Bind + carefully written run, but I'd have to see the full change to be sure, since I'm not sure what other changes are entailed

tomjaguarpaw commented 2 years ago

Thanks Gabriella. To clarify, I don't have plans to fix this either but I think it's helpful to have the requirements spelled out like this in case anyone else wants to take it on.