ekmett / machines

Networks of composable stream transducers
Other
339 stars 46 forks source link

On merging Plan with Machine #68

Open ekmett opened 9 years ago

ekmett commented 9 years ago

From @atzeus:

Hi Ed!

I was thinking about what you asked me at ICFP, about plans being machines, and I think I solved it: Plan = Machine!

As far as I can tell, there is no problem with associativity with the (~>) operator, which recurses on both arguments. Hence I think there are only problems with mplus and >>=, which I think can be solved by the following reasoning:

Machines are an instance of the Free MonadPlus, which is (inefficiently) defined as follows:

newtype FreePlus f a = FreePlus { getFreePlus :: [IFree f a] }
data IFree f a = Pure a
               | Impure (f (FreePlus f a))

instance Functor f => Monad (FreePlus f) where
  return = FreePlus . (\x -> [x]) . Pure              
  (FreePlus m) >>= g = FreePlus $ concatMap bind m where
     bind (Pure x) = getFreePlus (g x)
     bind (Impure f) = [ Impure $ fmap (>>= g) f ]

instance Functor f => MonadPlus (FreePlus f) where
  mzero = FreePlus []
  mplus (FreePlus l) (FreePlus r) = FreePlus (l ++ r)

Machines are defined in terms of the free monadplus as follows:

data MachineF i o a = AwaitF (i -> a) a | YieldF o a deriving Functor
type Machine i o a = FreePlus (MachineF i o) a

stop :: Machine i o a
stop = mzero

await :: Machine i o i
await = FreePlus $ [Impure $ AwaitF return stop ]

yield :: o -> Machine i o ()
yield x = FreePlus $ [Impure $ YieldF x (return ()) ]

... etc...

Now to get rid of the associativity problems of >>= and mplus, swap out the data structure for [IFree f a] and binding for more efficient ones.

Straightforwardly applying reflection without remorse gives: (TA is

import qualified Data.TASequence.FastCatQueue as TA
import Control.Monad
import Control.Applicative hiding (empty)
import Data.Sequence
import Data.Foldable
import Prelude hiding (foldl)

newtype FCP f a b = FCP (a -> FreePlus f b)
type FMPExp f a b = TA.FastTCQueue (FCP f) a b
newtype FreePlus f a = FreePlus { getFreePlus :: Seq (IFree f a) }
data IFree f a = 
   forall x. FMP (FreePlusView f x) (FMPExp f x a)
data FreePlusView f a   = Pure a 
                        | Impure (f (FreePlus f a))

bind :: FreePlus f a -> FMPExp f a b -> FreePlus f b
bind (FreePlus m) f = FreePlus $ fmap (`bindi` f) m

bindi :: IFree f a -> FMPExp f a b -> IFree f b
bindi (FMP m r) f = FMP m (r TA.>< f)

instance Monad (FreePlus f) where
  return x = FreePlus (singleton (FMP (Pure x) TA.tempty))
  m >>= f = bind m (TA.tsingleton (FCP f))

instance MonadPlus (FreePlus f) where
  mzero = FreePlus empty
  mplus l r = FreePlus (getFreePlus l >< getFreePlus r)

fromView :: Seq (FreePlusView f a) -> FreePlus f a
fromView m = FreePlus (fmap (\x -> FMP x TA.tempty) m)

toView :: Functor f => FreePlus f a -> Seq (FreePlusView f a)
toView (FreePlus m) = foldl (><) empty $ fmap down m where
  down (FMP h t) = 
   case h of
    Pure x -> 
     case TA.tviewl t of
        TA.TAEmptyL -> singleton (Pure x)
        FCP hc TA.:< tc -> toView (bind (hc x) tc)
    Impure f -> singleton $ Impure (fmap (`bind` t) f) 

However, now >>= is linear in the number of choices, which might be wasteful if most choices are thrown away. We can make the whole thing a bit more "lazy" by representing the choices/binds tree as follows:

import qualified Data.TASequence.FastCatQueue as TA
import Control.Monad
import Control.Applicative hiding (empty)
import Data.Sequence
import Data.Foldable
import Prelude hiding (foldl)

newtype FCP f a b = FCP (a -> FreePlus f b)
type FMPExp f a b = TA.FastTCQueue (FCP f) a b

data FreePlus f a = forall x. FreePlus (Seq (FreePlus f x)) (FMPExp f x a)
                  | FImpure (f (FreePlus f a)) --leaf
                  | FPure a -- leaf

bind :: FreePlus f a -> FMPExp f a b -> FreePlus f b
bind (FreePlus m r) f = FreePlus m (r TA.>< f)
bind m f = case TA.tviewl f of
             TA.TAEmptyL -> m
             _         -> FreePlus (singleton m) f

instance Monad (FreePlus f) where
  return = FPure
  m >>= f = bind m (TA.tsingleton $ FCP f)

instance MonadPlus (FreePlus f) where
  mzero = FreePlus empty TA.tempty
  mplus x@(FreePlus ml cl) y@(FreePlus mr cr) = 
    case (TA.tviewl cl, TA.tviewl cr) of
                   (TA.TAEmptyL, TA.TAEmptyL) -> FreePlus (ml >< mr) TA.tempty
                   _ -> FreePlus (singleton x |> y) TA.tempty
  mplus x y = FreePlus (singleton x |> y) TA.tempty

data ChoicesView f a = MZero
                     | MPlus (EffectView f a) (FreePlus f a)

data EffectView f a = Pure a 
                    | Impure (f (FreePlus f a))

fromView :: ChoicesView f a -> FreePlus f a
fromView MZero = mzero
fromView (MPlus x y) = fromEffView x `mplus` y

fromEffView (Pure a) = FPure a
fromEffView (Impure f) = FImpure f

toView :: Functor f => FreePlus f a -> ChoicesView f a
toView (FPure x) = MPlus (Pure x) mzero
toView (FImpure x) = MPlus (Impure x) mzero
toView (FreePlus m f) = 
   case viewl m of
     EmptyL -> MZero
     h :< t -> 
      case toView h of
        MZero -> toView (FreePlus t f)
        MPlus ch ct -> 
         let rest = FreePlus (ct <| t) f
         in case ch of
             Impure x -> MPlus (Impure (fmap (`bind` f) x)) rest
             Pure x   -> case TA.tviewl f of
                           TA.TAEmptyL -> MPlus (Pure x) rest
                           FCP hc TA.:< tc -> toView $ bind (hc x) tc `mplus` rest 

I have not tested the above code, but it does compile :)

As to if this is fast enough, that depends on the choice of (type-aligned and non-typealigned) datastructures. I also have a dirty trick to use regular sequence datastructures (such as Data.Seq) as a type-aligned sequence datastructure. This can help if the regular datastructure is already very optimized and you don't want to reimplement it :)

https://github.com/atzeus/reflectionwithoutremorse/blob/master/Data/LiftSequence.hs

Some inlining of the sequence datastructures into the FreeMonadPlus and/or inline the MachineF might also help.

Let me know if this helps and/or if you have any questions!

atzeus commented 9 years ago

I'm curious: Is there a use-case where unifying machines with plans solves a problem or is it just a nicer API?

ekmett commented 9 years ago

The main issue was more that the split between them was driven by a perceived necessity that is now known false. I haven't actually revisited my feelings on the API with them merged.

treeowl commented 9 years ago

Perl's motto is There's more than one way to do it, meaning that Perl programmers are proud to have a language chock full of non-orthogonal features. Haskell programmers tend to see things a bit differently. The current machines API is somewhat Perlish, in that sense.

construct $ x <|> y
--vs
construct x <> construct y

construct (go 0) where
  go 100 = stop
  go n = liftIO (print n) >> go (n + 1)

-- vs. (using a new thing in master)

runStateM 0 . repeatedly $ do
  n <- get
  when (n == 100) stop
  liftIO (print n)
  put (n + 1)

At each juncture, the user must decide whether to do something on the PlanT level or on the MachineT level.

ekmett commented 9 years ago

@treeowl: The main thing about plan vs. machine is that originally plans were added because we believed they were the only way to get O(1) construction, and machines were separated because we believed they were the only way to get O(1) composition without pipes-like slowdowns.

treeowl commented 9 years ago

@ekmett, I'm just saying that from an API standpoint, it leads to some unpleasant redundancies. A related/unrelated question: what are the performance impacts of <|> for PlanT and <> for MachineT? They kind of look like they might be able to stack up badly, but I can't tell for sure.

ekmett commented 9 years ago

I'm less concerned with the 'more than one way to do it' problems than "can I express all the things I want with viable runtime".

Without reflection without remorse the answer was no; you had things like pipes which just had situations that led to quadratic slowdown. So the Plan/Machine dichotomy was forced on us, not optional.

With the status quo, you have some options about when you make the cutover but you still have to make the cutover.

With the reflection without remorse machinery we could unify these ideas, so the question first and foremost in my mind is "is it really worth the constant factors?" and secondarily were we actually deriving tangible benefit from being able to quotient out thinking about the ''return' parameter from Plan when we wrapped it up in a machine. The latter benefit if it exists at all is a benefit we received as a knock-on consequence. It wasn't designed in.

You'd get a very different flavor of API with different instances and different trade-offs.

acowley commented 8 years ago

This was really helpful, @atzeus, thank you! I'm going to do some experiments with a somewhat more pipes-like arrangement in a project where I need to do some parsing. I plan to arrange to have the parsing work both naively (reflected, non-CPSed), and without remorse so I can see where the average case fits into things.

Without remorse is a 2x constant overhead on a baseline of right associated binds in my little test bed, but of course pulls ahead quite quickly in pathological scenarios. I have no idea what the typical complexity will look like in practice, but perhaps it will shed some light on the question of moving machines over to that design.

atzeus commented 8 years ago

Cheers @acowley ! The overhead depends largely on what kind of datastructure you are using for reflection without remorse and if you inline the datastructure into the freemonad(plus).

acowley commented 8 years ago

I'm using an unsafe variant on top of Data.Sequence with most of the structure flattened. I'll work on performance some more, and give it a shot with the parsing load I have (C pre-processing) to see how it works out.

atzeus commented 8 years ago

@acowley You can probably can some speed improvements from the unsafe variant of Data.Sequence by using the suggestion from @treeowl as described in https://github.com/atzeus/reflectionwithoutremorse/issues/1

Also if you do that, please push it to me :)

masaeedu commented 5 years ago

@ekmett Is the following:

newtype FreePlus f a = FreePlus { getFreePlus :: [IFree f a] }
data IFree f a = Pure a
               | Impure (f (FreePlus f a))

semantically equivalent to:

data Free f a = Pure f | Impure f (Free f a)
type IFree f a = Free (Compose f []) a

? Or are those two different things?