Gabriella439 / pipes

Compositional pipelines
BSD 3-Clause "New" or "Revised" License
486 stars 72 forks source link

Category axioms do not hold #5

Closed pcapriotti closed 12 years ago

pcapriotti commented 12 years ago

The category axioms do not hold for either the Lazy or Strict category instances, even if we identify observationally equal pipes. In particular, composition is not associative:

runPipe $ lift (print 1) >+> yield () >+> lift (print 2)

prints 2, while

runPipe $ lift (print 1) >+> (yield () >+> lift (print 2))

prints 2 and 1.

Also, id is not a right identity for terminated pipes. For example, given

feed = yield () >> lift (print 0)
p1 = feed >+> return ()
p2 = feed >+> (idP >+> return ())

the pipe p2 prints 0 while p1 doesn't print anything.

I initially thought it was just a matter of reordering some clauses in the definition of composition, but now I'm not so sure this could be fixed so easily. In fact, x >+> Yield should always evaluate to Yield (as it does currently), otherwise something of the form Yield >+> Await >+> Yield doesn't associate. But then it seems that my first counterexample is unavoidable.

The issue with identity is less of a concern, since it's always possible to introduce a formal identity (for example, by adding a Transform (a -> b) alternative to the Pipe declaration).

Gabriella439 commented 12 years ago

Thanks for pointing this out. I will check these when I get to my work computer, but if I remember the ordering of my case statement correctly, all of those should work. In your first counterexample, the second pipe should print two since the most downstream pipe never awaits before it pures and ends the whole pipeline. And the second example should never print anything because downstream pure trumps everything, including the idP's await. When I get to work I will test it myself and give you the full breakdown of how I proved the order of the case statements. On Jan 20, 2012 7:53 AM, "Paolo Capriotti" < reply@reply.github.com> wrote:

The category axioms do not hold for either the Lazy or Strict category instances, even if we identify observationally equal pipes. In particular, composition is not associative:

runPipe $ lift (print 1) >+> yield () >+> lift (print 2)

prints 2, while

runPipe $ lift (print 1) >+> (yield () >+> lift (print 2))

prints 2 and 1.

Also, id is not a right identity for terminated pipes. For example, given

feed = yield () >> lift (print 0) p1 = feed >+> return () p2 = feed >+> (idP >+> return ())

the pipe p2 prints 0 while p1 doesn't print anything.

I initially thought it was just a matter of reordering some clauses in the definition of composition, but now I'm not so sure this could be fixed so easily. In fact, x >+> Yield should always evaluate to Yield (as it does currently), otherwise something of the form Yield >+> Await >+> Yield doesn't associate. But then it seems that my first counterexample is unavoidable.

The issue with identity is less of a concern, since it's always possible to introduce a formal identity (for example, by adding a Transform (a -> b) alternative to the Pipe declaration).


Reply to this email directly or view it on GitHub: https://github.com/Gabriel439/Haskell-Pipes-Library/issues/5

Gabriella439 commented 12 years ago

And you cannot add a formal identity. It won't type-check because it constrains the type of composition. On Jan 20, 2012 7:53 AM, "Paolo Capriotti" < reply@reply.github.com> wrote:

The category axioms do not hold for either the Lazy or Strict category instances, even if we identify observationally equal pipes. In particular, composition is not associative:

runPipe $ lift (print 1) >+> yield () >+> lift (print 2)

prints 2, while

runPipe $ lift (print 1) >+> (yield () >+> lift (print 2))

prints 2 and 1.

Also, id is not a right identity for terminated pipes. For example, given

feed = yield () >> lift (print 0) p1 = feed >+> return () p2 = feed >+> (idP >+> return ())

the pipe p2 prints 0 while p1 doesn't print anything.

I initially thought it was just a matter of reordering some clauses in the definition of composition, but now I'm not so sure this could be fixed so easily. In fact, x >+> Yield should always evaluate to Yield (as it does currently), otherwise something of the form Yield >+> Await >+> Yield doesn't associate. But then it seems that my first counterexample is unavoidable.

The issue with identity is less of a concern, since it's always possible to introduce a formal identity (for example, by adding a Transform (a -> b) alternative to the Pipe declaration).


Reply to this email directly or view it on GitHub: https://github.com/Gabriel439/Haskell-Pipes-Library/issues/5

Gabriella439 commented 12 years ago

Oh, I know what's wrong. Right before I released the code, I moved the downstream pure statement near the bottom to make the code prettier and forgot that this would mess it up. Move it to right after the downstream yield,and m statements and before the await yield coupling and everything will work. I will explain the ordering when I'm at work. On Jan 20, 2012 7:53 AM, "Paolo Capriotti" < reply@reply.github.com> wrote:

The category axioms do not hold for either the Lazy or Strict category instances, even if we identify observationally equal pipes. In particular, composition is not associative:

runPipe $ lift (print 1) >+> yield () >+> lift (print 2)

prints 2, while

runPipe $ lift (print 1) >+> (yield () >+> lift (print 2))

prints 2 and 1.

Also, id is not a right identity for terminated pipes. For example, given

feed = yield () >> lift (print 0) p1 = feed >+> return () p2 = feed >+> (idP >+> return ())

the pipe p2 prints 0 while p1 doesn't print anything.

I initially thought it was just a matter of reordering some clauses in the definition of composition, but now I'm not so sure this could be fixed so easily. In fact, x >+> Yield should always evaluate to Yield (as it does currently), otherwise something of the form Yield >+> Await >+> Yield doesn't associate. But then it seems that my first counterexample is unavoidable.

The issue with identity is less of a concern, since it's always possible to introduce a formal identity (for example, by adding a Transform (a -> b) alternative to the Pipe declaration).


Reply to this email directly or view it on GitHub: https://github.com/Gabriel439/Haskell-Pipes-Library/issues/5

Gabriella439 commented 12 years ago

Here is the incorrect case statement:

     (Yield (x1, p1), p2            ) -> yield x1 >>         p1 <+< p2
     (M m1          , p2            ) -> lift m1 >>= \p1 -> p1 <+< p2
     (p1            , Await f2      ) -> await >>= \x  -> p1 <+< f2 x
     (p1            , M m2          ) -> lift m2 >>= \p2 -> p1 <+< p2
     (Await f1      , Yield (x2, p2)) -> f1 x2 <+< p2
     (Pure r1       , _             ) -> return r1 <-- in the wrong 

place (_ , Pure r2 ) -> return r2

And here is the corrected one:

     (Yield (x1, p1), p2            ) -> yield x1 >>         p1 <+< p2
     (M m1          , p2            ) -> lift m1 >>= \p1 -> p1 <+< p2
     (Pure r1       , _             ) -> return r1 <-- belongs here 

to short circuit upstream pipes (p1 , Await f2 ) -> await >>= \x -> p1 <+< f2 x (p1 , M m2 ) -> lift m2 >>= \p2 -> p1 <+< p2 (Await f1 , Yield (x2, p2)) -> f1 x2 <+< p2 (_ , Pure r2 ) -> return r2

Here's the reasoning behind the case statement order.

First off, several case statements can only resolve one way and are obvious. I'll use an arrow in the middle to indicate which one gets priority. The one on the left is the downstream pipe:

await -> m await -> pure await -> await yield <- yield m <- yield await = yield -- fusion

The identity laws require several more priorities. From the identity laws we can deduce that yield must trump everything because:

(yield x >> p) = id <+< (yield x >> p) = (yield x >> id) <+< p

For id to be transparent in the above example, the yield statement must take priority over any upstream pipe.

Similarly, we can deduce that upstream await must be of lower priority than anything:

p = p <+< id = p <+< (await >>= yield >> id)

For id to be transparent in the above example, every pipe must trump upstream await.

That leaves four remaining cases: pure . yield, pure . pure, pure . m, m . pure, and m . m. All of these we can resolve using associativity with a yield as the middle statement:

(pure a . yield b) . pure c = pure a . (yield b . pure c) pure a . pure c = pure a . yield b pure a . pure c = pure a

This gives us downstream priority for pure.

(pure a . yield b) . mc = pure a . (yield b . mc) pure a . mc = pure a . yield b pure a . mc = pure a

Again, downstream pure trumps monad.

(ma . yield b) . pure c = ma . (yield b . pure c) ma . pure c = ma . yield b ma . pure c = ma

That proves downstream priority for monad over pure.

(ma . yield b) . mc = ma . (yield b . mc) ma . mc = ma . yield b ma . mc = ma

Downstream priority for monad over itself.

(pure a . yield b) . pure c = pure a . (yield b . pure c) pure a . pure c = pure a . yield b pure a = pure a . yield b

Downstream priority for pure over yield. This last one is important because it indicates we should not even discard yield's value. Pure short-circuits even that.

This gives us the following priority diagram:

PYAM +---- P|LLLL Y|LLLL A|R=RR M|LLLL

Where the downstream pipe is the left side of the diagram (Pure, Yield, Await, Monad), and the upstream Pipe is the top of the diagram. L and R indicate whether the case statement prioritizes Left (i.e. downstream) or Right (upstream). The (corrected) Lazy case statement reflects the above diagram.

After that it's just a matter of verifying that the above priorities work for all 8 identity cases and all 64 associativity cases. I actually went to all that trouble to prove that all of these do in fact work, including for the case you cited, namely:

(yield a . await) . yield c = yield a . (await . yield c) yield a . yield c = yield a . pure c yield a = yield a

Strict evaluation works because it completely ignores the top line of the above diagram since the left hand statement can never pure (since it is stuck in a forever discard loop).

I hope that helps. I will submit an emergency patch to fix the broken case statement later today.

On 01/20/2012 07:53 AM, Paolo Capriotti wrote:

The category axioms do not hold for either the Lazy or Strict category instances, even if we identify observationally equal pipes. In particular, composition is not associative:

 runPipe $ lift (print 1)>+>  yield ()>+>  lift (print 2)

prints 2, while

 runPipe $ lift (print 1)>+>  (yield ()>+>  lift (print 2))

prints 2 and 1.

Also, id is not a right identity for terminated pipes. For example, given

 feed = yield ()>>  lift (print 0)
 p1 = feed>+>  return ()
 p2 = feed>+>  (idP>+>  return ())

the pipe p2 prints 0 while p1 doesn't print anything.

I initially thought it was just a matter of reordering some clauses in the definition of composition, but now I'm not so sure this could be fixed so easily. In fact, x>+> Yield should always evaluate to Yield (as it does currently), otherwise something of the form Yield>+> Await>+> Yield doesn't associate. But then it seems that my first counterexample is unavoidable.

The issue with identity is less of a concern, since it's always possible to introduce a formal identity (for example, by adding a Transform (a -> b) alternative to the Pipe declaration).


Reply to this email directly or view it on GitHub: https://github.com/Gabriel439/Haskell-Pipes-Library/issues/5

pcapriotti commented 12 years ago

You're right, that works! Good job checking all the cases. I actually wrote a proof in Agda to make sure I wasn't missing anything, so I'm pretty convinced that it's ok now :)

Gabriella439 commented 12 years ago

Could you send the proof? I've never used Agda before but I've heard about it and I might be able to use it to check my Arrow instance more quickly.

On 01/20/2012 10:54 AM, Paolo Capriotti wrote:

You're right, that works! Good job checking all the cases. I actually wrote a proof in Agda to make sure I wasn't missing anything, so I'm pretty convinced that it's ok now :)


Reply to this email directly or view it on GitHub: https://github.com/Gabriel439/Haskell-Pipes-Library/issues/5#issuecomment-3588548

pcapriotti commented 12 years ago

Here it is: http://hpaste.org/56747 However

Gabriella439 commented 12 years ago

One thing that makes the proof a lot easier if you write the cases as:

     (Yield y1, p2            ) -> Yield $ fmap (<+< p2) y1
     (M m1    , p2            ) -> M     $ fmap (<+< p2) m1-- 

replace liftM with fmap (Pure r1 , p2 ) -> Pure $ fmap (<+< p2) r1 -- treat Pure as the () functor (Await f1, Yield (x2, p2)) -> f1 x2 <+< p2 (p1 , Await a2 ) -> Await $ fmap (p1 <+<) a2 (p1 , M m2 ) -> M $ fmap (p1 <+<) m2 (p1 , Pure r2 ) -> Pure $ fmap (p1 <+<) r2

I consider that version to be the most elegant way to think about how composition works.

If you don't know what I mean by the () functor, it's just:

fmap f () = ()

And it satisfies the functor laws.

Then you can see that if you imagine each Pipe as a deck of cards where each functor layer is a card, composition just merges those two decks giving priority to the left deck. The above version of composition makes it trivial to prove associativity (you always take from the left deck first). The only special case is await/yield fusion which just says that only way you yield control to the upstream function is by calling await and the only way it returns control back is by calling yield.

On 01/20/2012 12:38 PM, Paolo Capriotti wrote:

Here it is: http://hpaste.org/56747 However

  • The monad part is simplified (there's no bind). Shouldn't actually be too hard to put a real monad there, since you don't even need the monad laws for this part.
  • It requires extensionality. I think this could be solved by defining an appropriate equivalence relation on pipes instead of using structural equality. Besides, structural equality is too strong anyway, for example return 3 and lift (return 3) should be equivalent, but they're not structurally equal.
  • It only works for finite pipes. I have never used coinduction in agda... I should learn at some point :) Unfortunately, the identity is infinite, so I can't prove the identity laws until I fix that.

Reply to this email directly or view it on GitHub: https://github.com/Gabriel439/Haskell-Pipes-Library/issues/5#issuecomment-3590124