Quadratic slowdown with sequence, mapM, replicateM and co.

klao commented 10 years ago

Summary

The Proxy monad has bad asymptotic behavior with monadic constructs that do not "properly lean to the right". In particular: sequence, mapM and replicateM are quadratic in the length of the lists they produce.

Demonstration

Consider the following two programs:

import Control.Monad (replicateM)
import Pipes

numbers :: Producer Int IO r
numbers = go 0
  where
    go k = yield k >> go (k+1)

main :: IO ()
main = do
  l <- runEffect $ numbers >-> replicateM 20000 await
  print $ last l

And:

import Control.Monad (mapM)
import Pipes

main :: IO ()
main = do
  l <- runEffect $ mapM (lift . return) [1..5000 :: Int]
  print $ last l

Both run in about a second and demonstrate clear quadratic slowdown when you change the corresponding number.

Analysis

We have discovered the issue because we have something like the first use-case in our production code. We have a pipe that parses messages from a byte stream, and one particular message type contains a list of elements. The lists are of nontrivial but bounded length: a few thousand elements each. The code looks something like this:

parseMessage :: Pipe ByteString Message m r
parseMessage = do
  t <- parseMessageTag
  parseMessageBasedOnTag t

...

parseThatOneParticularMessage = do
  header <- parseHeader
  numElts <- parseInt
  elts <- replicateM numElts parseElement
  return $ ThatOneParticularMessage header elts
    where
      parseElement = do
        a <- parseInt
        ...
        return $ Element a b c

When we introduced this message type we noticed the issue immediately because the whole thing stopped for a few seconds to parse one message with less than 10k elements.

The issue was quite tricky to debug, because the profiling output looked unhelpful: the cpu time is spent in parseOneParticularMessage in the replicateM line, but not within the parseElement function. Which seems impossible if you think about replicateM as a loop. The problem is, replicateM is not a loop, it's a recursive monadic expression, and in particular: one that does not "lean to the right". The standard implementation is equivalent to this:

replicateM :: Monad m => Int -> m a -> m [a]
replicateM 0 _ = return []
replicateM n op = do
  x <- op
  xs <- replicateM (n-1) op
  return (x:xs)

Once we have realized that this is the issue, we could work-around it with an accumulator based replicateM:

replicateM' :: Monad m => Int -> m a -> m [a]
replicateM' n op = go n []
  where
    go 0 acc = return $ reverse acc
    go n acc = do
      x <- op
      go (n-1) (x:acc)

And the run time dropped to a millisecond from a few seconds.

Source of the problem

So, why is this happening? Actually, this is the same issue why the Free monad has a quadratic slow down in similar cases. (If you change the runEffect in the second example to retract from Control.Monad.Free you'll see the same effect.) If the binds do not associate to the right you have to traverse the structure over and over again. Very similarly to lists and appends: (...(([1] ++ [2]) ++ [3]) ++ ...) ++ [n] is O(n^2), while [1] ++ ([2] ++ ... ([n-1] ++ [n])) is O(n).

I'd like to emphasize that this issue is not specific to pipes. Anything that is structurally similarl to Proxy and wants to be a monad will be the same. So, for example, Conduit has the same issue.

Evaluation

So, how serious is this issue? I'd like to argue that this is a pretty serious flaw.

It's easy to unknowingly step into this trap.
It's not obvious where the problem is: if our lists were a bit smaller we wouldn't have noticed the issue. But even with a list of hundred elements you pay a 100x overhead. And you notice it if you have a problem with an overall performance of your code, but then:
It's very tricky to debug. I only realized what the problem is, because I've read before about the same thing with Free and spent some time understanding it.
It can be worked-around, but only on a case-by-case basis. For example, I don't think it would be possible to change the implementation of replicateM and its brothers in base to the one above, as it has different laziness properties. And it's not restricted to these standard functions: this is the natural thing to do when you do recursion in monadic code.
Possible solution

I know of only one way of dealing with this in general. It's the solution to the "efficient free" monad, which you can see here: http://hackage.haskell.org/package/free-4.1/docs/Control-Monad-Free-Church.html or here: http://hackage.haskell.org/package/kan-extensions-3.7/docs/Control-Monad-Codensity.html . You need to change the representation of your basic data structure to "continuation passing" one.

It sounds scarier than it is, with all the fancy category theory stuff around it. Well, it did break my brain for a day or so, but you get used to it. And in the end it's surprisingly little work to change the code from one to the other. I have a prototype that shows that it works for a simplified Proxy. See: https://gist.github.com/klao/7237705. I will do it for the actual Proxy type and then update this bug with it.

The approach probably has its drawbacks, most notably we'll need to thoroughly check how it affects performance. (My little prototype looks promising in that too.) But, I think that even a little slow-down is acceptable for an asymptotic speed up. ;)

Acknowledgments

The bug was found by @errge and we also debugged it together.

Gabriella439 commented 10 years ago

The general solution to this is to layer a Codensity monad transformer (which you can find here) beneath the Proxy monad transformer, like this:

myPipe :: Pipe a b (Codensity m) r
myPipe = do
    -- This runs in linear time
    as <- replicateM n myOtherPipes
    ...

Then you do all your replicateM-like computations with the Codensity transformer beneath them. This will properly convert the computation to linear time complexity. Then there are two ways you can remove the Codensity layer.

You can remove it after runEffect, using runCodensity
You can define a runCodensityP that removes the Codensity layer without running the pipe.

I can provide the code for runCodensityP if you want it. Either way, solution 1 works really well and requires no special integration with pipes. Both of these solutions give you the linear time complexity you want.

In fact, note that your replicateM' is precisely equivalent to:

replicateM' n m = runCodensity (replicateM n (lift m))

I also just want to briefly point out that your first example can be simplified to either:

-- Better
main = print . last =<< Pipes.Prelude.toListM numbers

-- Best
main = print =<< Pipes.Prelude.last numbers

However, in the general case (i.e. your parsing code) you want Codensity.

I've been long aware of this time complexity issue and I'm really well-versed in the analogy between various free monad encodings and their equivalent list analogs. There are several reason I prefer the current encoding:

This encoding is much more optimal for composition. If you think of pipe composition as a list merge, it requires a lot of operations at the "head" of the list, which the naive free monad is very good and efficient at.
This encoding it makes it very easy to speed things up with rewrite rules since rewrite rules fire much more reliably when you match against constructors rather than continuations. This is similar to the problem that CPS-encodings of lists have: you can't easily test a CPS-encoded list for equality without manifesting the list, destroying the benefit of the CPS transformation.
CPS-encoding does not work well when you switch between multiple categories. To make an analogy, imagine that you were trying to CPS-encode a matrix instead of a list. CPS-encoding one dimension of the matrix, comes at the expense of CPS-encoding the other dimension. The 5 pipe categories effectively function as a "5-dimensional" matrix, and CPS-encoding each dimension penalizes you when you mix motions in multiple dimensions since each switch in direction requires materializing the previous dimension you were moving along.

So I prefer not to CPS the internal implementation of pipes and instead I would rather outsource the time complexity improvement to Codensity.

DanBurton commented 10 years ago

@Gabriel439 I believe you on point 2, but it sounds very anecdotal. I wonder if we could get a GHC hacker to explain or address this in more detail?

Point 3 is very interesting, but I don't quite understand how CPS-encoding pipes slows down its categories. The analogy makes sense, but I'm having a hard time wrapping my brain around what this actually looks like for Proxy.

data Proxy a' a b' b m r
    = Request a' (a  -> Proxy a' a b' b m r )
    | Respond b  (b' -> Proxy a' a b' b m r )
    | M          (m    (Proxy a' a b' b m r))
    | Pure    r

newtype ProxyC a' a b' b m r = ProxyC {
  runProxyC :: forall x.
       (a' -> (a -> x) -> x) -- Request
    -> (b -> (b' -> x) -> x) -- Respond
    -> (m x -> x) -- M
    -> (r -> x) -- Pure
    -> r
}

Like I said, the slowdown/inefficiency of the latter representation isn't quite registering in my brain.

n.b. I've toyed with something like this in yield:

newtype Producing o i m a
  = Producing { using :: forall r. (o -> ContT r m i) -> ContT r m a }

https://github.com/DanBurton/yield/blob/master/Control/Yield/Cont.hs

But I haven't played around with it enough to have really harvested any notable insights.

cartazio commented 10 years ago

Its worth pointing out that the machines lib by Kmett is designed around the codensity style approach https://github.com/ekmett/machines/ http://hackage.haskell.org/package/machines, and that Anthony Cowley has been working on making it nicer still recently

Gabriella439 commented 10 years ago

Responding to Dan:

So I was actually just going back through the pipes code and studying the rewrite RULES I wrote and I realized that all of them would be unnecessary if I were to switch to a CPS representation, so point 2 of mine is actually not valid.

Responding to both Dan and Carter:

The easiest way to grasp point 3 is to just implement pull-based composition for CPS-encoded pipes.

If you study machines you'll see that not even Edward does this and he materializes his Plan type to a a Process (which is a non-CPS free monad) before he lets you compose it. I think that then leads to another problem, which is that you can't convert it back to a Plan (I'm not totally sure if that's true, since it has been a while since I studied machines in detail). However, if that is true, then machines would not solve the problem of doing something like the following pipes code:

do r <- replicateM (p1 >-> p2)
   p3

.. since machines would need to convert p1 and p2 to a non-CPS form before composing them, and then that triggers the quadratic slowdown from replicateM.

This is why I prefer the solution of using Codensity in the base monad. It works no matter how many times you switch back and forth between pipe composition and sequencing, yet always gives linear time for replicateM.

The reason I don't put Codensity in the base monad by default is that you take a hit to performance for pipe composition. Even if you bake it in directly to the pipes implementation the composition operator becomes 50% slower, so I prefer to leave Codensity as opt-in.

The other nice thing about Codensity is that it is really to turn it on selectively. Going back to the parsing example that Mihaly had problems with, the fix to his code is to just selectively turn on Codensity only around the replicateM:

as <- runCodensity $ replicateM numElts $ lift parseElement

This means you don't pay any performance penalty for other segments of the pipeline that don't need Codensity.

klao commented 10 years ago

A few quick comments. (I need more time to think it all through thoroughly.)

About Codensity, somewhat a nitpick. Putting Codensity underneath Proxy doesn't do anything. My first bad example only uses Proxy's pure parts, when you runEffect it, it translates into a single return; the base monad of Proxy does not come into play at all. Putting Codensity above Proxy works (this is why it's a nitpick, you last comment with runCodensity $ replicateM numElts $ lift parseElement shows that this is what you meant). But it requires quite a high level of understanding from the user. They need to be aware of the issue and know exactly where do they need this additional "efficiency spicing".
About switching between different categories. I also don't quite understand what you are trying to say with that. Church-encoding is isomorphic to the original representation. So, when you chose that, you already made the design decisions about what will be efficient and what won't. (With your matrix analogy: you already chose the row or column-major encoding.) Yes, you will need to manifest the Proxy, it's impossible to implement the more complicated combinators without that (and a natural way in most cases). But, you can switch back and forth between the manifested and the CPS representation, and the manifested representation will be exactly the current Proxy data. (Yes, I'm also not sure if and how this can be done with the Codensity based Plan in machines, but this is definitely possible with Church-encoding based approach. The one that Dan defined above.) See the toPipe and fromPipe (that's the isomorphism) in my little prototype and the implementation of the >!-> combinator, which is maximally naive, but still seems to be performing quite well.
And finally a more general comment. This issue is sneaky, it bites unexpectedly, and the less you know and understand about it the harder it bites. So, saying that there is an instrument with which one can overcome it, if one understands the issue and knows how to wield the instrument... This is avoiding the issue somewhat, I think.

Also, we are in a desperate need of a benchmark suite for pipes. I'm willing to do the basic work of translating Proxy to Church-encoding, but then we would need to compare the performance and in as many details as possible. And I don't know how to do this right now.

Gabriella439 commented 10 years ago

Oops, you are right. Codensity has to be on top. Thanks for the correction.

Also, pipes does have a benchmark suite (located here and integrated with cabal bench already).

However, if you are going to benchmark your church encoding, then your test pipes need to be defined using await, yield, and the monad instance for the church-encoded Proxy. I will not accept code like this in the standard library:

numbersC :: PipeC i Int m r
numbersC = PipeC (\_ yi _ _ -> let go k = yi k (go (k+1)) in go 0)

takeC :: Int -> PipeC e e m ()
takeC n = PipeC res
  where
    res ret yi aw _ = go n
      where
        go 0 = ret ()
        go k = aw (\e -> yi e (go (k-1)))

A guiding principles behind pipes is that the implementation of everything (including the standard library) has to be readable and intuitive. One of the reasons I began this library in the first place was as a reaction to the iteratee style of programming, where iteratee definitions were totally unreadable to me and looked basically like your takeC code. Pipes are supposed to be as easy to write as they are to connect.

However, I'm still unlikely to merge in a CPS implementation. I want to emphasize that keeping the core implementation as simple and readable as possible is a really high priority for me since pipes is designed to be a model library that other streaming libraries (even in other languages) can learn from. For example, Dan's yield library was inspired by pipes and a lot of conduit was taken from pipes as well. This is a good thing, because I want other people to steal the innovations from pipes.

Also, I want to conclude with an empirical argument: as you noted, conduit does not do this optimization either, but is currently the mostly highly used streaming library, so I don't think this flaw is as serious as you make it out to be.

klao commented 10 years ago

However, if you are going to benchmark your church encoding, then your test pipes need to be defined using await, yield, and the monad instance for the church-encoded Proxy. I will not accept code like this in the standard library:

Yes, of course, obviously! These were not intended as a demonstration of how things should be implemented. I was just trying to wrap my head around the continuation passing style and these were my test cases that I tried (and managed to :)) implement staying completely within it.

My intention is that for things that do not touch the internals the implementation should stay identical.

errge commented 10 years ago

I just want to weigh in for a bit on the seriousness of the issue.

I found this in the code I was writing and was totally surprised, definitely weren't looking for it just to write a bug report. Maybe I'm not a Haskell expert yet, but also far from a newbie. Still, had to stop for a whole workday to figure this out.

In my opinion having these kind of secret traps is totally unacceptable. This was the whole promise of the pipes library compared to other hard to understand and hard to use abstract non-sense iteratee libraries.

I only got suspicious, because I had 10K of elements in one function, if I were having functions with 100 elements each, I would've just concluded that "well, Haskell or pipes, or something is a bit slow here, but still reasonable".

Also, Mihaly and me knew about free monad and codensity, still it was not trivial to use them right.

Even you couldn't use them right for the first time, you've provided three solutions:

codensity around IO and runCodensity in front of runEffect -> fail,
write runCodensityP -> fail,
replicateM' n m = runCodensity (replicateM n (lift m)) -> not even correctly typed for a replicateM replacement.

You can consider these nitpicks from my part, but I think they show that it's not easy and trivial to find and fix these kind of issues. At the very least you should clearly document this in the reference and the tutorial. As it is documented in Control.Monad.Free, not just in Church. But I think you should really consider fixing this somehow for real, if not with Mihaly's solution, then some other way.

Thanks.

michaelt commented 10 years ago

If I understand, Gabriel was using runCodensity in the sense of lowerCodensity in kan-extensions, i.e. as \p -> runCodensity p return. This is how runCodensityP and runCodensityK are implemented in pipes-3 (the newtype constructor was unCodensityP) So this runs fine for example.

  import Control.Monad (replicateM)
  import Pipes
  import Control.Monad.Codensity

  numbers = go 0
    where
      go k = yield k >> lift (print k) >> go (k+1)

  main :: IO ()
  main = do
    l <- runEffect $ numbers >-> lowerCodensity (replicateM 20000 (lift await))
    print $ last l

I think some of Gabriel's remarks presuppose above presuppose that the reader is keeping in mind http://hackage.haskell.org/package/pipes-3.3.0/docs/Control-Proxy-Trans-Codensity.html The pipes package has already had a cps-ed, or rather codensity-ized, implementation.

Gabriella439 commented 10 years ago

Like Michael said, I meant lowerCodensity instead of runCodensity.

I still favor the lift + lowerCodensity solution because it solves all the problems that Gergeley and Mihaly have presented. As far as I can tell, their only issue with this solution is that it is not obvious, so I will include an appendix in the tutorial to explain this problem and the work-around.

I do not agree with the argument that just because an uninformed user can potentially write code with poor time complexity that it is a serious bug. If I did, then I would also conclude that we should abandon all linked lists for difference lists because uninformed Haskell beginners can (and do) write linked list code that gives quadratic time complexity. I've also pointed out that conduit has the exact same problem and seems to do just fine regardless.

Also, I do not appreciate appeals to emotion (i.e. "fail"). I'm not debating for the sake of being stubborn. I've done a lot of thinking about this, and have already implemented this idea at least twice before (one of which Michael has already mentioned, back in the 3.* cycle), so this is not something I'm dismissing carelessly without having tried it. I weighed the benefits of readability, simplicity, efficiency (in other use cases), versus likelihood of tripping over this bug and decided that readability and composition efficiency were more important and that I would be willing to take the time to educate users on how to work around this problem when they came across it.

If I didn't care at all about your concerns I would have already closed this issue. The reason I keep it open is that I understand that your viewpoint is valid even if I do not agree with it, and I hope that we can find some solution satisfactory to us all if we keep the discussion going.

errge commented 10 years ago

My point was that this stuff is hard and I still stand by that point.

The "fail"s were maybe a bit harsh, sorry about that, but I only wanted to point out the fact that you didn't manage to score one correct solution out of three, IMHO this shows how hard is the problem and has nothing to do with you failing in any sense. Two out of the three solutions were type correct, but still wrong. What can we assume from a less experienced user in this environment?

I'm also very, very sceptical in the "education" part, because as a beginner Haskeller if you learn about the list concatenation issue, you can take it away, that "++ is somehow evil and if I ever happen to append a lot of stuff, I should maybe come back to understand this". What will be the takeaway here, what is the telltale sign of the issue here?

Also, if education fails just a little bit and people don't really understand the issue and can't be confidently separate the bad cases from the good cases, then they will start applying the workaround preemptively resulting in a bad coding style. Just like the completely unnecessary double quoting in some commercial shell scripts to defend against empty strings: if [ "x$1" = "x" ] ; then ....

So I'm already grateful if you document it, but it would be better to solve it. And yes, I appreciate that you're still discussing this issue. Thank you.

Gabriella439 commented 10 years ago

The tell-tale sign of the issue is the use of "List done wrong" (i.e. a monad action returning a list). This is what sequence, replicateM, and mapM all have in common. That's why I think it is possible to clearly communicate when to preemptively use it.

klao commented 10 years ago

This will be a bit of a philosophical rather than a technical comment. But, the discussion already has shifted into this direction, so I think it's OK.

I understand and actually very much agree with the desire of keeping the core (and the whole pipes library) as clean and simple as possible. I think this is one the biggest strengths of pipes. It was definitely a major factor in our decision of choosing pipes as a basis for our infrastructure. After you understand the very simple principles behind the Proxy types it was easy to reason about expressions involving pipes and understand what will happen. (Btw. I think that the pull based category was better for this than the new default -- cat. But this is largely a matter of taste, and the cat based category has its own obvious advantages.)

But, I would argue that the conceptual cleanness and simplicity is the important part here. The simplicity of the actual core code is important too, but is only second to that. And this change should not affect the conceptual cleanness at all. It's not like adding ad-hoc constructors to Proxy or sacrificing category laws for some seeming benefit; it's actually a beautiful mathematical transformation that we need to do here, which doesn't require us sacrificing any important properties.

I spent a little time before posting this issue thinking about how much uglier and harder to understand would this change make the pipes core. And to some degree it obviously will. But, I see it as our task to manage this: to think how to structure it best, how to make it more natural to people with carefully commented code. I see it as a challenge: how to make this important change (if it's important at all) as unobtrusive as possible. But, to reiterate an earlier point: when I started out with pipes, the important part was to understand how to read a Proxy p => b' -> p a' a b' b m r, what do the different parameters mean, how do these compose etc. I didn't look at the implementation at all. Conceptual simplicity is much more important.

I am not arguing here about how important is this change. But if is at least a little bit important, then I think the right thing for the pipes library to do would be to try and figure out what is best, cleanest, most maintainable, easiest to understand, etc. way to do it. After all, you want to serve as the best example for others, right? So, we shouldn't be complacent just because this is a hard issue and others didn't fix it either.

PS: I am also going to write a short technical comment a little bit later. But in general, may I ask that we try it out and see how it looks? (Well, I try and you see how it looks. :)) Before going into further unfruitful discussions. As it seems we understand each others' concerns, but don't agree with them.

Gabriella439 commented 10 years ago

Alright, then how about this. Write up a draft implementation using CPS style just for the Kleisli, request, and respond categories. Those should be very easy to write without having to materialize anything. Then I will work with you to try and make the implementation of (+>>) and (>>~) cleaner and try to stay within CPS. If I'm satisfied that there is a nice way to implement those two operators then I will make the switch.

Gabriella439 commented 10 years ago

I wrote up the core CPS parts for the Kleisli, request, and respond categories. I will try my hand at the other two categories soon.

Here's the basic code:

http://lpaste.net/95030

klao commented 10 years ago

It looks nice, actually! ;)

Mind you, I don't think it's possible to implement +>> without manifesting at least one of the sides. At least I didn't find any (efficient) way of zipping two Church-encoded lists, which is the analogous problem.

michaelt commented 10 years ago

Maybe it will work well, but it bothers me a little that this sudden CPS'ing of pipes is a response to the accumulation of lists. The examples we began with don't give rise to trouble if we adopt familiar bread-and-butter Haskell measures like using Data.Sequence. Why shouldn't the second field of ThatOneParticularMessage be a Seq rather than a []?

  import Pipes
  import qualified Data.Sequence as S
  import Data.Sequence (ViewR(..))

  numbers = go 0
    where
      go k = yield k >> go (k+1)

  main :: IO ()
  main = do
    l <- runEffect $ numbers >-> S.replicateM 20000 await
    let _ :> a = S.viewr l 
    print a

  import Pipes
  import qualified Data.Traversable as T
  import Data.Sequence 

  main :: IO ()
  main = do
    l <- runEffect $ T.mapM (lift . return) $ iterateN 5000 (+1) 1
    let _ :> a = viewr l 
    print a

errge commented 10 years ago

michaelt: yes, you change the asymptotic to log(n)*n with that trick, you're right. That's log(n) times more than needed. If you change the number of items and measure carefully, you can see the log creeping in.

But more importantly, in practice you make a 10x CPU waste and 1.2x memory waste compared to codensity + list replicateM, the 10 fold CPU difference seems to be really bad. To get this results, I used 5M elements and compared your first version to the codensity vesion:

import Control.Monad
import Pipes
import Control.Monad.Codensity

numbers = go 0
  where
    go k = yield k >> go (k+1)

main :: IO ()
main = do
  l <- runEffect $ numbers >-> lowerCodensity (replicateM 5000000 (lift await))
  print $ last l

So the asymptotics is almost right, but the constants seem to be very big.

klao commented 10 years ago

Michael: Gergely's analysis is spot on. This issue is not at all about lists being "inefficient". It's about recursive monadic functions. Sometimes you can make them "tail" recursive, but in many cases you can't.

For example, when we first encountered this issue we also tried whether a "more efficient" data structure makes a difference. We reached for vector, which is the epitome of efficiency in Haskell, right? Well, with vector's replicateM you get exactly the same quadratic behavior as with the simple list. (For a while I thought this was a bug in vector -- it's strict anyways, so it could do it in a tail recursive way. But, then I realized that vector also cares about different criteria, most importantly if you want to be able to stream fuse away a vector created by replicateM you must do it in this way.)

So, if you write a recursive monadic function and for any reason you can't make it tail recursive (eg. laziness, stream fusion, or just it not being naturally tail-recursive) it will perform badly when you use it directly over Proxy.

michaelt commented 10 years ago

I could do better but soon enough my irrational fear of cps'ing will begin to show....

import Control.Monad (replicateM, liftM)
import Pipes
import qualified Pipes.Prelude as P
import qualified Data.Vector.Unboxed as V
import qualified Data.Vector.Unboxed.Mutable as M

numbers :: Producer Int IO r
numbers = go 0 where go k = yield k >> go (k+1)

main :: IO ()
main = do
  v <- runEffect $ numbers >-> tawdry 0 2000000
  print $ V.last v

tawdry :: V.Unbox a => a -> Int -> Consumer a IO (V.Vector a)
tawdry initial n = do mv <- lift (M.replicate n initial) 
                      let go 0 = return ()
                          go m = do a <- await
                                    lift (M.unsafeWrite mv (n - m) a)
                                    go (m-1)    
                      go n
                      lift $ V.freeze mv

I did have trouble at my first attempt using pure vectors, but wasn't too surprised I had gummed up the optimization machinery. There might be a way though.

Gabriella439 commented 10 years ago

Again, though, lowerCodensity + lift works with the vector replicateM. It's a totally general solution for this kind of thing.

klao commented 10 years ago

I've played around with a rough sketch of how pipes would look like in Church encoding. You can take a look here: https://github.com/klao/Haskell-Pipes-Library/tree/church

But, my preliminary results are not very encouraging:

The thing is quite slow: 1.5-3 times slower than the current pipes. It might be possible to optimize it, I didn't look into that yet.
I seems to me that there is no way to write an efficient next over the Church encoded Proxy type. If we want to expose next to the users (and we probably do, as it's quite useful), we can only do it over the manifested (old) Proxy. Which means that we have to expose that and the converting functions to the user too. And explain how to use them. Which is not impossible, but, well, it does make things quite a bit messier.

I will continue thinking about this issue. But with these additional concerns (together with your previous ones ;)) you might want to just close this issue.

cartazio commented 10 years ago

does 1.5-3x slower in the quadratic slow down case too?

klao commented 10 years ago

No, obviously not. :)

The 1.5-3x slowdown is for the current benchmarks in pipes. If you mix in some IO in your pipes then the slow down is only in the overhead, which is almost negligible. And of course, it's "blazingly fast", ie. linear in the problematic cases of this issue--where the current pipes are quadratic.

Gabriella439 commented 10 years ago

Thanks for benchmarking that.

I'll still leave this open for about a week just in case somebody else has alternative ideas.

cartazio commented 10 years ago

If i find some free time this week, i'll dig into what could be done on the optimization front. (i probably don't have teh bandwidth, but i'll try to)

Gabriella439 commented 10 years ago

I also wanted to mention that the "Handlers in Action" paper from ICFP'13 contains a comparison of pipes to their CPS representation of pipes on the last page. You can see how even without rewrite rules the naive free monad implementation has better constant factors. Once you throw in rewrite rules it starts to outperform their implementation by a factor of 3 on average. You can find the paper here:

http://homepages.inf.ed.ac.uk/slindley/papers/handlers.pdf

cartazio commented 10 years ago

huh, so even in the "bad asymptotics" case, its faster? or was this for very very long compositions?

Gabriella439 commented 10 years ago

No, this is not testing the case of bad asymptotics. The first benchmark is testing the speed of the monad bind and the second is testing the speed of await/yield handoffs (basically).

errge commented 10 years ago

Please don't forget to give some instructions regarding this issue somewhere in the documentation. Maybe it's not tutorial material, but somewhere in the reference! Thanks. And thanks for the fruitful discussion around this issue in the last weeks.

Gabriella439 commented 10 years ago

You're right. I completely forgot about that. Then I will keep this open until I add the instructions to the documentation.

Gabriella439 commented 10 years ago

Alright, I've added the documentation as an appendix to the tutorial section, so I will mark this issue closed. If you still have any concerns, though, feel free to reopen the issue.

Gabriella439 / pipes