Gabriella439 / Haskell-Morte-Library

A bare-bones calculus-of-constructions
BSD 3-Clause "New" or "Revised" License
372 stars 25 forks source link

Consider using the `bound` library #54

Open sgraf812 opened 7 years ago

sgraf812 commented 7 years ago

Since normalization performance probably relies on efficient substitution etc., would you be interested in accepting a PR which uses the bound library?

If so, I'll see what I can do!

Gabriella439 commented 7 years ago

Yes, I would definitely be interested in this. I think I tried this at one point before and the part I got stuck on was how to type-check expressions when using bound due to having to keep track of a context that referenced bound variables

thoughtpolice commented 7 years ago

FWIW, I used Morte as the beginning of my own experiments this past week, and one of the first things I did was port it to Bound. I just got done with the type checker; thanks to a VERY helpful pointer from @AndrasKovacs in this StackOverflow answer. Using this strategy it was quite trivial to port everything - you keep track of the context as a function essentially, and use polymorphic recursion to recurse into a Scope.

Note you'll have to deal with a reasonable amount of boilerplate, but it might be shorter than your hand written substitution (and certainly less error prone). Here's what it looks like (with some combinators and OverloadedStrings):

> let example = lam "a" star (lam "x" "a" "x") :: Expr String
> let ty = either (error "bad") id (infer0 cooked example)

> example
Lam (Kind Star) (Scope (Lam (Var (B ())) (Scope (Var (B ())))))

> ty
Pi (Kind Star) (Scope (Pi (Var (B ())) (Scope (Var (F (Var (B ())))))))

> pretty NoColor example
(λ(a : *) →  (λ(b : a) →  b))

> pretty NoColor ty
Π (a : *) →  Π (_ : a) →  a

I can put something up shortly enough, if it interests both of you (needless to say I'd like to continue my own experiments -- but @sgraf812, if you'd like to take my code and port it into Morte, that'd be fantastic!)

Gabriella439 commented 7 years ago

I don't mind even if it's longer than my hand-written substitution code as dealing with bound variables has been the number one source of errors in my compiler implementations

sgraf812 commented 7 years ago

@thoughtpolice Not really necessary now that I read through your really helpful post and the SO answer, but do you have that port of morte somewhere online?

thoughtpolice commented 7 years ago

It's not ready yet (and I'm reworking everything and changing the language.) But the basics explain it easily enough:

The type:

-- | The type of all types; resulting in either @'Star'@ or @'Box'@.
data Kind
  = Star
  -- ^ The type of a @'Type'@, which has its own type, @'Box'@.
  | Box
  -- ^ The type of @'Star'@, which should never be mentioned by the user
  -- program.
  deriving (Eq, Show, Enum, Bounded)

-- | The core dependent calculus.
data Expr a
  = Var a
  -- ^ Variable bindings
  | App (Expr a) (Expr a)
  -- ^ Function application
  | Lam (Type a) (Scope () Expr a)
  -- ^ Lambda abstraction
  | Pi  (Type a) (Scope () Type a)
  -- ^ Pi types (both dependent and non-dependent function types)
  | Kind Kind
  -- ^ Kinds
  deriving (Eq, Show, Functor, Foldable, Traversable)

#if MIN_VERSION_transformers(0,5,0)
instance Show1 Expr where
  liftShowsPrec _ _ d   (Kind k)  = showsPrec d k
  liftShowsPrec sp _ d  (Var v)   = showsUnaryWith sp "Var" d v
  liftShowsPrec sp sl d (App a b) = showsBinaryWith v v "App" d a b
    where v = liftShowsPrec sp sl
  liftShowsPrec sp sl d (Lam a b) = showsBinaryWith v1 v2 "Lam" d a b
    where
      v1 = liftShowsPrec sp sl
      v2 = liftShowsPrec sp sl
  liftShowsPrec sp sl d (Pi a b) = showsBinaryWith v1 v2 "Pi" d a b
    where
      v1 = liftShowsPrec sp sl
      v2 = liftShowsPrec sp sl

instance Eq1 Expr where
  liftEq f (Var a)   (Var b)   = f a b
  liftEq f (App a b) (App c d) = liftEq f a c && liftEq f b d
  liftEq f (Lam a b) (Lam c d) = liftEq f a c && liftEq f b d
  liftEq f (Pi  a b) (Pi  c d) = liftEq f a c && liftEq f b d
  liftEq _ (Kind k)  (Kind j)  = j == k
  liftEq _ _ _                 = False
#else
instance Show1 Expr where showsPrec1 = showsPrec
instance Eq1 Expr   where eq1        = (==)
#endif

-- | A simple alias expressing that an @'Expr'@ is also a @'Type'@ in our
-- language.
type Type = Expr

instance Applicative Expr where
  pure  = return
  (<*>) = ap

instance Monad Expr where
  return = Var

  Var a    >>= f = f a
  Kind k   >>= _ = Kind k
  App k a  >>= f = App (k >>= f) (a >>=  f)
  Lam t s  >>= f = Lam (t >>= f) (s >>>= f)
  Pi  t s  >>= f = Pi  (t >>= f) (s >>>= f)

-- | Allows @\"x\"@ as a shorthand for @'Var' \"x\"@.
instance IsString (Expr String) where
  fromString = Var

A small DSL for constructing values (recommended, since it abstracts the binders for you):

-- | The type of a type.
star :: Expr a
star = Kind Star

-- | The type of @'star'@ -- resulting in the \"top\" of the type hierarchy.
box :: Expr a
box = Kind Box

-- | Create a variable reference.
var :: a -> Expr a
var = Var

-- | Create an application.
(@:) :: Expr a -> Expr a -> Expr a
(@:) = App

-- | Create a lambda.
lam :: Eq a => a -> Type a -> Expr a -> Expr a
lam v t e = Lam t (abstract1 v e)

-- | Create a dependent function type.
pi_ :: Eq a => a -> Type a -> Expr a -> Expr a
pi_ v t e = Pi t (abstract1 v e)

-- | Create a non-dependent function type.
(==>) :: Type a -> Type a -> Type a
a ==> b = Pi a (Scope (fmap (F . pure) b))
infixr 5 ==>

Finally, type checking:

-- | The type checking monad, represented as a simple error monad returning an
-- error string.
type TcM a = Either TypeErrorMsg a

-- | A context for the type checker, binding a set of names to some types. In
-- this case, our name lookup is represented as a function.
type Ctx a = a -> TcM (Type a)

-- | The empty context. Only for demonstration purposes -- you'll normally want
-- to have a \"cooked\" version of some primitive names. Such as the use of
-- @'ctx1'@ against some set of primitives.
ctx0 :: Show a => Ctx a
ctx0 n = Left (ErrUnboundVar $ show n)

-- | A context populated with a pre-existing set of names that may be
-- referenced.
ctx1 :: (Show a, Eq a) => [(a, Expr a)] -> Ctx a
ctx1 prel x = case lookup x prel of
  Nothing -> Left (ErrUnboundVar $ show x)
  Just v  -> infer ctx0 v

-- | Add a type to a @'Ctx'@.
ctxS :: Type a
     -- ^ The type to add to the @'Ctx'@.
     -> Ctx a
     -- ^ The input @'Ctx'@.
     -> Ctx (Var () a)
     -- ^ The new @'Ctx'@ with the newly bound name.
ctxS ty _   (B ()) = pure (fmap F ty)
ctxS _  ctx (F a)  = fmap (fmap F) (ctx a)

-- | Core axioms: determining the type @'Star'@ and @'Box'@. The only axiom is:
--
-- @
-- ⊦ * : □        -- 'Star' is of type 'Box'
-- @
--
-- If a @'Box'@ is encountered, type checking fails.
--
axiom :: Kind
      -- ^ The input @'Kind'@.
      -> TcM (Type a)
      -- ^ The type of the input @'Kind'@.
axiom Star = pure box
axiom Box  = Left ErrFoundBox

-- | Rules defining the parameters (or \"edges\") of the lambda cube -- defining
-- the types of dependent function types. The rules specifies which function
-- types are allowed:
--
-- @
-- ⊦ * ↝ * : *    -- 'Star' to 'Star' is of type 'Star'
-- ⊦ □ ↝ * : *    -- 'Box' to 'Star' is of type 'Star'
-- ⊦ * ↝ □ : □    -- 'Star' to 'Box' is of type 'Box'
-- ⊦ □ ↝ □ : □    -- 'Box' to 'Box' is of type 'Box'
-- @
--
-- Given the input and output @'Kind'@ of the @'Pi'@ type, this determines the
-- resulting output @'Kind'@ and returns it as an @'Expr'@. This function does
-- not fail.
--
rule :: Monad m
     => Kind
     -- ^ Input @'Kind'@ of the @'Pi'@ type.
     -> Kind
     -- ^ Output @'Kind'@ of the @'Pi'@ type.
     -> m (Type a)
     -- ^ Resulting @'Type'@ of the dependent function type.
rule Star Box  = pure box
rule Star Star = pure star
rule Box  Box  = pure box
rule Box Star  = pure star

-- | Typecheck an @'Expr'@ against a given @'Type'@ with a custom @'Ctx'@.
typecheck :: (Show a, Eq a)
          => Ctx a
          -- ^ Custom context.
          -> Type a
          -- ^ The expected type.
          -> Expr a
          -- ^ The term to check against.
          -> TcM ()
typecheck ctx want expr = do
  have <- infer ctx expr
  when (want /= have) $
    Left $ ErrTypeMismatch (show have) (show want)

-- | Do type inference on an @'Expr'@ with a custom @'Ctx'@.
infer :: (Show a, Eq a)
      => Ctx a
      -- ^ Custom context.
      -> Expr a
      -- ^ Input expression.
      -> TcM (Type a)
      -- ^ Resulting type
infer ctx e = case e of
  Var a   -> ctx a
  Kind k  -> axiom k
  Lam t s -> do
    let (t',ctx') = (nf t, ctxS t' ctx)
    Pi t' . toScope <$> infer ctx' (fromScope s)

  Pi t s -> do
    let (t',ctx') = (nf t, ctxS t' ctx)

    inty <- infer ctx t'
    kin  <- getKind ErrInvalidInputTy inty

    outy <- infer ctx' (fromScope s)
    kout <- getKind ErrInvalidOutputTy outy

    rule kin kout

  App f x -> infer ctx f >>= \f' -> case f' of
    Pi ty s -> do
      typecheck ctx ty x
      nfM (instantiate1 x s)
    ty -> death $ ErrNotAFunction (show ty)

  where
    death       = Left
    getKind k x = case x of
      Kind k -> pure k
      _      -> death (k $ show x)

nfM :: Monad m => Expr a -> m (Expr a)
nfM = return . nf

The fact Ctx is polymorphically recursive means that cleaning this up any further or abstracting it will probably require some trickery with a GADT/existential or something, which I'm hoping to investigate to make error reports nicer (right now I simply have my type errors only use Strings, hence all the Show constraints.)

Pretty printing is a also a little more work. You need an infinite set of names to substitute in for fresh binders:

-- | An infinite list of names, used for variable substitution during pretty
-- printing.
names :: [String]
names = initial ++ rest where
  initial = [ [i] | i <- ['a'..'z']]
  rest    = [ i : show j | j <- [1 :: Int ..], i <- ['a'..'z'] ]

The pretty-printer then looks like:

-- | Pretty print an @'Expr'@.
prettyExprWith :: [String] -> Expr String -> String
prettyExprWith _      (Kind k)    = prettyKind k
prettyExprWith _      (Var v)     = v
prettyExprWith vs     (App x e)   = unwords [ prettyExprWith vs x, prettyExprWith vs e ]
prettyExprWith (v:vs) (Lam t s)   =
     "(λ(" ++ vn ++ " : " ++ prettyExprWith vs t ++ ") →  "
  ++ prettyExprWith vs new ++ ")"
  where
    (mentioned, new) = isUsed (Var v) nonvar s
    vn | mentioned   = "_"
       | otherwise   = v

prettyExprWith (v:vs) (Pi t s)    =
     "Π (" ++ vn ++ " : " ++ prettyExprWith vs t ++ ") →  "
  ++ prettyExprWith vs new
  where
    (mentioned, new) = isUsed (Var v) nonvar s
    vn | mentioned   = "_"
       | otherwise   = v

prettyExprWith [] _               = error "Core.prettyExprWith: impossible!"

-- | Pretty print an @'Expr'@.
prettyExpr :: Expr String -> String
prettyExpr = prettyExprWith names

The use of isUsed is a little trick in Bound I came up with, to see if substituting a value into a Scope changes it (if it doesn't, then the variable is dead i.e. not mentioned, and so you can print the binder as "_" instead) -- although using it will require more work:

-- | A non-existant variable; used in order to determine if the instantiation of
-- a variable into a particular scope will yield any replacements; if it
-- doesn't, then the variable isn't mentioned, and we can suppress its
-- name.
--
-- NOTE: This MUST be disjoint from any of the names possibly generated by
-- @'names'@.
nonvar :: Expr String
nonvar = Var "_"

-- | Determine if some name is mentioned in a scope; this is done by attempting
-- to substitute in a bogus name, as well as the real name, and comparing the
-- two results. If the name was not mentioned, then substituting a bogus name
-- will yield the same result as if the given name was -- that is, if the name
-- is not mentioned, substitution is a no-op.
--
-- Note that in order for this to be guaranteed, the name of the \"bogus
-- variable\" must never be a legitimate term name.
--
-- This returns whether or not the substitution succeeded, as well as the
-- instantiated @'Scope'@ if it did.
isUsed :: (Monad f, Eq (f a))
       => f a
       -- ^ The real variable to substitute into a @'Scope'@.
       -> f a
       -- ^ The bogus variable used to substitute into a @'Scope'@, then compare
       -- with the legitimate version.
       -> Scope n f a
       -- ^ The input scope.
       -> (Bool, f a)
isUsed realVar nonVar scope = (new == dummy, new)
  where
    new = instantiate1 realVar scope
    dummy = instantiate1 nonVar scope

Clearly, handling binders for printing, etc could be improved here, probably with a few extra utilities on top of Bound, tying it to a name supply.

In any case, I think that should get you started. If I make this nicer, I'll report back.

thoughtpolice commented 7 years ago

Also, I'd probably suggest skipping the isUsed trick, because I'm not sure it's completely sound in the case someone actually uses _ as a variable name if that's allowed, as it's the same as nonvar (so a free _ name would get substituted); it's not a problem with type checking, binding actually works -- only that it may make pretty printing weird as hell. I guess you can't have everything so easily...

sgraf812 commented 7 years ago

Thanks, that's really helpful! Also, it's some steps ahead of roughly what I had in mind, so I'm more confident now that I'm doing the right thing.

Re: GADT, I was already looking into this but abandoned it for now. What I came up with was essentially

data Context k v where
    Shift :: v -> Context k v -> Context (Var () k) (Var () v)
    FVs :: Ord k => Map k v -> Context k v

But it felt far too ad-hoc. Also it didn't help with pretty printing the context.

I was mostly thinking about alternatives to the toScope/fromScope approach because toScope would rebuild the entire tree, causing at least O(n^2) if I'm not mistaken. Although I could be wrong, so I'll probably benchmark later on.

Re: isUsed: I had a similar problem when migrating freeIn. I already got rid of it altogether, but what I did was along the lines of

-- | Try to unshift the given expression if its argument is unused.
tryUnshift :: (Traversable f, Monad f) => f (Var b (f a)) -> Maybe (f a)
tryUnshift = fmap join . traverse unF
  where
    unF x = case x of
        F e -> Just e
        _ -> Nothing
thoughtpolice commented 7 years ago

Oh, I also forgot to mention: bound on Hackage currently doesn't build against GHC 8.0, just for anyone reading this and hasn't noticed it. So you'll need to hold off any kind of real merge until the next version (bound-2.0) is out.

ekmett commented 7 years ago

IIRC http://hub.darcs.net/dolio/upts/browse/Language/UPTS by Dan Doel should tackle the polymorphic recursive cases correctly. He wrote it as I was finishing bound as an proof of concept.

sgraf812 commented 7 years ago

I got a first working prototype here. There's still much left to be fixed (like the QuickCheck test suite), but it's a good starting point for clean up.

One thing that bothers me is performance: The recursive test takes 150s to normalize, vs. 2 seconds with the old implementation. Normalization is pretty much as implemented here (plus eta reduction, which when turned off makes the tests fail slower). It might be that I missed some opportunities for sharing, but the drop in performance is probably coming from being accidentally quadratic due to fromScope/toScope nastiness.

I'll have to think about how to combine normalization with lifting Fs to child nodes, if that's at all possible.

Edit: Ermine seems to go directly to STG for evaluation. Maybe we could do something similar? Probably not, we want to stay in CoC.

ekmett commented 7 years ago

Pretty much anything that uses fromScope/toScope is doomed to terrible performance.

The key to high-performance bound code is liberally using lift to get cheaper expressions for later substitutions and using instantiate so that it can avoid walking into the lifted areas.

If you are throwing stuff at a wall to see what sticks, another technique you might want to look into is the 'fast circular substitution' trick I used here:

http://comonad.com/reader/2014/fast-circular-substitution/

sgraf812 commented 7 years ago

I was about to give up on performance after some manual complicated unwrapping of Scopes, which had minor impact (80s instead of 150s). But through some accident I made these changes, e.g. evaluate to WHNF before actually doing anything else, which got rid of the performance problems.

Since I haven't really cleaned up, there isn't much performance data, but the normalization for the recursion test case went down from 319 ms to 49 ms, for the factorial test case from 527 ms to 15 ms.

I tried the same "optimization" for the current master (non-bound), but that just made it slower.

As to why this is: WHNF is pretty much the optimal scenario for bound code, since we just have to instantiate and delay optimization of abstraction bodies (e.g. expressions wrapped in Scope). This means we can get rid of most of the toScope/fromScope calls, which are needed (I think, things are getting really ugly otherwise with matching on polymorphically recursive F-chains) for complete beta-eta-normalization.

So, I don't think there is a way around toScope/fromScope (if so, I'd really like to know!), but doing as much normalization as possible without them yields acceptable performance. In general there might be some crafted input for which this explodes.

Gabriella439 commented 7 years ago

One thing that's not clear: what is the performance comparison for:

sgraf812 commented 7 years ago

I'll do that eventually, when I think it's working good enough for a PR. TODO:

Regarding performance, I'm thinking about the best way to move F nodes as far up as possible in the generalized de Bruijn representation, although the polymorphic recursion is crazy as always.

Also, I'm thinking about some heuristic for when to normalize arguments in redexes (e.g. App (Lam b) a => substitute1 a b vs. substitute1 (whnf a) b ) which, strange enough, has mixed effects on benchmark performance, despite of sharing.

ekmett commented 7 years ago

Yeah it is possible to write an explicit normalizer that lifts F nodes as far as possible up the tree. Unfortunately it can't be done in a very general setting. =(

The fast-circular-substitution approach I linked above on the other hand can deal with the cost of multiple variables being bound simultaneously in a rather efficient manner in case that is becoming your bottleneck.

AndrasKovacs commented 7 years ago

I think that by far the best way to do dependent type checking is Coquand's algorithm or variations on it. It's tremendously faster than evaluation by syntactic substitution and AFAIK it's the general evaluation method used in both Coq and Agda. Although it scales to Coq levels, in simple Morte-like cases it's also simpler than substitutions and easier to write correctly. I have a document that provides some references and also explains how the algorithm can be adapted to the exact Morte syntax, i. e. lambdas with annotated bindings. Maybe I'll do some Morte benchmarking with it when I have some time.

Gabriella439 commented 7 years ago

I mean, if this were really about performance then I'd actually recommend the approach in The Optimal Implementation of Functional Programming Languages. This is more about just eliminating common errors when dealing with bound variables without taking too much of a performance hit

VictorTaelin commented 7 years ago

The Optimal Implementation of Functional Programming Languages.

(You just summoned me.)

@Gabriel439, how hard do you think it would be to integrate Elementary Affine Logic with Morte? Honestly, I don't think I'll have the capacity to write it as well as you and @AndrasKovacs could; at least, not before I gain much more experience writing type checkers and interpreting those logical rules; I'd certainly get a detail or another wrong. According to Damiano Mazza, if I understood correctly, there are no reasons that adding dependent types to EAL wouldn't be possible.

Since you can determine the normalization complexity of EAL terms by looking at their types, you can add fixpoints and thus have the Scott encoding without losing strong normalization. I've thinking a lot about this and I absolutely came to the conclusion that is the right way to do things, for several reasons, and I have marvelous plans for it, but I need that type checker... so, in short, I would absolutely enormously love to have a simple dependently typed core like Morte, or even @AndrasKovacs's "nosubst", but using linear types. Any chances? :)

Gabriella439 commented 7 years ago

@MaiaVictor I think this is possible. I actually spent some time working on a small program to verify that an expression was EAL-typeable so this is something that interests me

thoughtpolice commented 7 years ago

@sgraf812

Re: isUsed: I had a similar problem when migrating freeIn. I already got rid of it altogether, but what I did was along the lines of

Although you migrated away from it, FWIW, I found a much more elegant formulation for this thanks to a tip from Ed last week, that I got around to. Just in case anyone is interested, behold:

-- | Determine if some value occurs in the set of free @'Var'@s of a given
-- @'Scope'@.
--
-- Equivalent to @used = \\v s -> 'Data.Set.member' v ('free' s)@
used :: (Foldable f, Ord a) => a -> Scope b f a -> Bool
used val scope = val `Set.member` free scope

-- | Return the set of free variables for a given @'Scope'@.
free :: (Foldable f, Ord a) => Scope b f a -> Set.Set a
free = foldr Set.insert Set.empty

Also, my absurd hack for pretty printing, when you get around to it, can simply be handled by isClosed in bound. If some expression is closed, then you can just throw away the generated name, obviously, since it won't get bound to anything. I attribute my terrible solution to being bad at computers.

VictorTaelin commented 7 years ago

@Gabriel439 that is great, is that program online? Do you have plans for implementing a type checker?

ekmett commented 7 years ago

FWIW- used can be implemented much more cheaply by folding with Any.

used x = getAny . foldMap (\y -> Any $ x == y)

That should have both faster constants and asymptotics.

sgraf812 commented 7 years ago

Just dropping in to confirm rumors of my work having stalled. I'll probably look into this at a later point, when I have actual spare time to throw at this (I'm being optimistic here).

For the record: Stuff mostly works, except for x@255 syntax, but normalization is currently unbearably slow for cases like the concat benchmark. For more info see my comment above.

Gabriella439 commented 7 years ago

@sgraf812 No problem. I'll leave this issue open since this is a very common request and somebody else may want to pick this up

@MaiaVictor It's not online yet. I'll try to dig it up as soon as I can

ChristopherKing42 commented 6 years ago

Just chiming in to say that bound-2.0 is out: https://github.com/ekmett/bound/releases/tag/v2