fir-lang / fir

MIT License
23 stars 3 forks source link

Type syntax for contexts, trait definitions and implementations #18

Open osa1 opened 1 month ago

osa1 commented 1 month ago

With higher-kinded types and multiple parameters to traits, we can implement many of the widely used Haskell typeclasses and types in Fir.

However currently we don't have a syntax for contexts, and implementing the methods at the top-level (rather than nested in an impl block or similar) is also not ideal when the types get complicated.

As a simple example, consider these Haskell types:

class Monad m => MonadState s m where ...   -- ideally `s` should be an associated type, but this works as an example

data StateT s m a = ...                     -- or newtype, doesn't matter

instance Monad m => MonadState s (StateT s m) where ...

modify :: MonadState s m => (s -> s) -> m () 

Some options in Fir:

trait MonadState[Monad[M]][S, M]: ...

type StateT[S, M, A] = ...

impl MonadState[Monad[M]][S, StateT[S, M]]: ...

fn modify[MonadState[S, M]](update: fn(S): S): M[()] = ...

Or with the fat arrow:

trait Monad[M] => MonadState[S, M]: ...

type StateT[S, M, A] = ...

impl Monad[M] => MonadState[S, StateT[S, M]]: ...

-- Weird and confusing
fn [MonadState[S, M]] => modify(update: fn(S): S): M[()] = ...

I think fat arrow to separate context from the type won't work in function definitions.

I also wanted to explicitly list type parameters (and allow specifying kinds) in generic functions, but it gets noisy very quickly:

fn modify[S, M][MonadState[S, M]](update: fn(S): S): M[()] = ...

This has 3 separate lists: (1) type parameter lists (2) types in the context (predicates) (3) parameter list.

I would think that most functions will have just (3). The second most common will be (1) and (3) combined. However when you need (2) you also need (1), so functions with contexts will be noisy.

We could use a different bracket type for the type parameters. It's still noisy but may be a bit easier to read:

fn modify{S, M}[MonadState[S, M]](update: fn(S): S): M[()] = ...

Angle brackets are not an option as they're used for binary operators.

We could infer type parameter lists, but then type applications get confusing as you can't easily say which type parameter is the first and which is the second. Type applications should be very rare so maybe that's OK. When showing a type signature in a documentation page or in LSP hints etc. we can always show the inferred type parameter list.

M[()] part also looks ugly, but maybe it can be improved with M[Unit] or M[_] (inferred).

osa1 commented 1 month ago

These problems around the syntax made me question whether we need multi-parameter traits at all. I think associated types give us the most of the benefits (maybe even all), with potentially simpler syntax.

Consider the examples in "Type classes: an exploration of the design space":

class Monad m => VarMonad m v where
  new :: a -> m (v a)
  get :: v a -> m a
  put :: v a -> a -> m ()

instance VarMonad IO MutVar where ...
instance VarMonad (ST s) (MutableVar s) where ...

This type represents monads that go along with their mutable variables. E.g. MutVar is a mutable variable that can be accessed in IO monad, so we implement VarMonad IO MutVar.

A very similar typeclass can be implemented with an associated type as single parameter type class:

class Monad m => VarMonad m where
  type Var m :: Type -> Type

  new :: a -> m (Var m a)
  get :: Var m a -> m a
  put :: Var m a -> a -> m ()

instance VarMonad IO where
  type Var IO = MutVar
  ...

instance VarMonad (ST s) where
  type Var (ST s) = STRef s
  ...

This version is not 100% the same as the original, it's equivalent to the original definition but with a functional dependency m -> v. This means I can't have two implementations with the same m but different Var. Example:

newtype MyIORef a = IORef a

instance VarMonad IO where      -- duplicate instance, rejected
  type Var IO = MyIORef
  ...

This was possible with the original typeclass:

instance VarMonad IO MyIORef where ...
instance VarMonad IO IORef where ...    -- OK

My questions about this limitation of associated types are:

  1. Is this limitation important in practice? What useful typeclasses do I lose with this restriction?
  2. Can we consider associated types of a typeclass as a part of its signature?

I feel like single-param tcs with associated types can be made to have the same experssive power as multi-param tcs if we consider associated types as a part of the signature. For example, a function that uses the more flexible definition with MPTCs:

statefulFun :: forall m a . VarMonad m a => m Int
statefulFun = do
  ref :: a Int <- new 123
  get ref

(Note: we don't care about principal types and ambiguities in Fir, when there's an ambiguity we ask user to provide types, and function definitions always have a type signature)

Could perhaps be written using the single-param version: (not tested)

statefulFun :: forall m a . (VarMonad m, Type m ~ a) => m Int
statefulFun = do
  ref :: a Int <- new 123
  get ref

Here Type m ~ a part makes the associated type a part of the typeclass resolving process. With this we can now have two instances of VarMonad IO when the associated types are different.

(This syntax is the same as Rust's T: Iterator<Item = X>, which I think serves the same purpose in Rust)

The second example in the paper can also be implemented with one param + associated types.

The last example is a bit interesting because I think it's a bad example, and I'm not sure how to handle it best with a single-param tc that makes the associated types a part of the signature.

class IODevice handle a where
  receive :: handle -> IO a
  send :: handle -> a -> IO a       -- I think this is a typo in the paper, the return type should've been `IO ()`

class IODevice Mouse MouseEvent where ...

Apart from the typo, I don't know what kind of handle allows you to receive and send events, and even if there are such handles the I would think that a handle will have one type of event, so a should either be behind a functional dependency as handle -> a, or even better it should be an associated type.

However if we make it an associated type we want to make it not a part of the signature, so it will effectively enforce the functional dependency handle -> a.

class IODevice handle where
  type Event handle :: Type

  receive :: Event -> IO Event
  send :: Event -> a -> IO ()

So perhaps we need two types of associated types: one that becomes a part of the signature, and one that's not.

Section 2.2 shows an example where we have multiple params, and constraints on some of those params:

class Collection c a where ...

instance Eq a => Collection ListSet a where ...

I think this is not a big deal, we can come up with some syntax to add constraints on associated types.

As a more complicated and realistic example:

class Monad m => MonadState s m where ...

instance Monad m => MonadState s (StateT s m) where ...

instance MonadState s m => MonadState s (ExceptT e m) where ...

With s as associated type, this might look something like:

class Monad m => MonadState m where
  type State m :: Type -> Type
  ...

instance Monad m => MonadState (StateT m) where
  type State (StateT m) = State m
  ...

instance MonadState m => MonadState (ExceptT e m) where
  type State (ExceptT e m) = State m
  ...

So my conclusion is that single-parameter traits can be made as expressive as multi-parameter ones, using associated types that can be made a part of the trait type signature using something like Rust's Trait<AssocType = X>.

osa1 commented 1 month ago

In my post above I basically proposed that:

However I don't know if (1) is actually useful. I think we could get away with just (2).

The examples in my previous post would look like this in Fir:

trait VarMonad[M: Monad]:
    type Var            # TODO: Add kind signature here

    fn new[T](val: T): M[Var[T]]
    fn get[T](var: Var[T]): M[T]
    fn put[T](var: Var[T], val: T): M[()]

This is basically syntactic sugar for trait VarMonad[M: Monad, Var]: .... When specifying the Var type in a context we use M: VarMonad[Var = T].

Or in other words, a MPTC in Haskell like class C T1 T2 T3 will look like

trait C[T1]:
    type T2
    type T3
    ...

and when we have C a b c in a context in Haskell, we will have this in a type parameter list in Fir: A: C[T2 = B, T3 = C].

This makes the syntax simpler because we don't need a context list. So the example in my first post becomes:

# Instead of this:
fn modify[S, M][MonadState[S, M]](update: fn(S): S): M[()] = ...

# We have this:
fn modify[S, M: MonadState[State = S]](update: fn(S): S): M[()] = ...

# Or type parameters in the other order:
fn modify[M: MonadState[State = S], S](update: fn(S): S): M[()] = ...
osa1 commented 1 month ago

The other thing we need to figure out is how to allow method call syntax for trait methods.

Consider the functor typeclass:

trait Functor[F]:
    fn fmap[A, B](f: Fn(A): B, val: F[A]): F[B]

Here it doesn't make sense to allow method call syntax for fmap, as there isn't a "self" type. F has kind Type -> Type so it can't be a type of self.

We could allow the first parameter of a method to be the receiver, allowing

# We don't have a syntax for lambdas yet
fn increment(i: I32): I32 = i + 1

increment.fmap(Some(123))

But it looks a bit weird, in a trait like Functor there isn't an obvoius "self" type that we can allow in the receiver position of the method call syntax.

So we probably don't want to allow this in all trait methods.

What we could do is: if a trait type parameter has kind Type, we allow the first parameter of the methods to be self. E.g.

trait Eq[T]:
    # The `==` operator
    fn __eq(self, other: T): Bool

trait Ord[T]:
    # Comparison operators use this method
    fn __cmp(self, other: T): Ordering
osa1 commented 1 month ago

We could perhaps allow method call syntax with a user specified self type:

trait Functor[F]:
    fn fmap[A, B](self: F[A], f: Fn(A): B): F[B]

We also need to allow implementing a trait with a method named m and having a non-trait method with the same name, otherwise a type may nto be able to implement a trait just because it already has a method with the same name.

osa1 commented 1 month ago

I think for now we should go ahead with some simple design that we know will work, then do the experimentation with the bootstrapped compiler.

The interpreter already has some simplification, like not having higher-kinded types. We should feel free to simpify other things as well.

The current design I have in mind for the interpreter:

If necessary, we should also disallow (for now) having a trait method and a non-trait method with the same name.

This should allow Eq, Ord, Hash, ToDebugStr etc. that we will need in the bootstrapped compiler without making it too complex.

We can then iterate on the bootstrapped compiler.