Algebraic type promotion

pthariensflame commented 10 years ago

I mentioned this potential feature in this issue under the name DataKinds; the standard paper describing it is here. Note that the paper describes both type promotion and kind polymorphism, the latter of which is not part of this proposal for Ceylon (although if you want it as well as type promotion, then by all means go for it :) ).

gavinking commented 10 years ago

@pthariensflame Before I get too deep into this paper, what should I be looking for here, from our point of view?

The paper talks a lot about Haskell's kind system, and its limitations. That's not quite relevant to us because:

we don't have type constructor parameterization (yet), and might never have it, unless someone can propose a good way to do higher-order type argument inference over in #754, and
if I'm understanding everything here correctly, it seems that unlike Haskell we don't need higher-order generics to represent types like Nat and Vector.

See this gist for how we can represent Nat and Vector in Ceylon.

So the thing I really did notice upfront is that Haskell's type families let you express the Plus function for Nats, which is really where we run aground right now. This is why I've toyed with the idea with essentially engineering Plus and Pred as primitive functions in the type system. But perhaps type families are a way to avoid that?

What else am I supposed to be paying attention to?

pthariensflame commented 10 years ago

Really, the paper is talking about integrating directly into Haskell's kind system a feature that could already be (mostly) emulated (notice that it mentions the Strathclyde Haskell Enhancement, which (among other things) did this for you with some simple preprocessing). The benefit of the system that the paper talks about is that, just like Ceylon, Haskell values type safety very highly, but Haskell's type system wasn't (and Ceylon's type system isn't) kind-safe in the same way that both languages' value systems are type-safe. The paper proposes a system that is both easy to implement and easy to understand, and yet gets you most of the kind-safety of full-spectrum dependent typing.

Vector is a pretty standard academic example of an easy dependently-typed thing, but it's so easy as to be essentially trivial to emulate in Haskell, even without DataKinds:

{-# LANGUAGE EmptyDataDecls, GADTs, UndecidableInstances, FlexibleInstances, TypeFamilies #-}

data Zero
data Succ n

data Vector n a where
    Nil :: Vector Zero
    Cons :: a -> Vector n a -> Vector (Succ n) a

type family Plus m n
type instance Plus Zero n = n
type instance Plus (Succ m) n = Succ (Plus m n)

append :: Vector m a -> Vector n a -> Vector (Plus m n) a
append Nil ys = ys
append (Cons x xs) = Cons x (append xs ys)

head :: Vector (Succ n) a -> a
head (Cons x _) = x

tail :: Vector (Succ n) a -> Vector n a
tail (Cons _ xs) = xs

However, as the paper points out, we can easily write Vector Int Float, which is a valid type still, albeit an empty one. The kind system provides no help for us here, essentially making our type-level Nat "dynamically kinded". Even in Ceylon, where subtyping saves us from passing Integer in that parameter, we can still pass Nat, which is even worse: that type is inhabited, which means our abstraction is extremely leaky! If we enjoy and promote the use of type safety on the value level, it seems only reasonable to do the same for kind safety on the type level.

Also, consider the sheer ugliness in Haskell without DataKinds when implementing and using HList, the heterogeneous list type:

{-# LANGUAGE EmptyDataDecls, GADTs, UndecidableInstances, FlexibleInstances, TypeFamilies #-}

data Nil
data Cons x xs

data HList ts where
    HNil :: HList Nil
    HCons :: t -> HList ts -> HList (Cons t ts)

type family Append xs ys
type instance Append Nil ys = ys
type instance Append (Cons x xs) ys = Cons x (Append xs ys)

hAppend :: HList ts1 -> HList ts2 -> HList (Append ts1 ts2)
hAppend HNil ys = ys
hAppend (HCons x xs) ys = HCons x (hAppend xs ys)

-- implementation above, usage below

example :: HList (Cons Int (Cons Float (Cons String Nil)))
example = HCons 2 (HCons 5.339 (HCons "hello" HNil))

Notice that Ceylon can't properly implement HList (witness the ugliness of unsugared tuple types in Ceylon), let alone hAppend, which can't be implemented in Ceylon at all. In addition, this code is still "dynamically kinded" in the way I described above.

Now, observe how much more understandable and regular the Haskell code becomes as soon as DataKinds is in place:

{-# LANGUAGE DataKinds, GADTs, KindSignatures, TypeFamilies, FlexibleInstances, UndecidableInstances #-}

data HList (ts :: [*]) where
    HNil :: HList '[]
    HCons :: t -> HList ts -> HList (t ': ts)

type family Append (xs :: [*]) (ys :: [*]) :: [*]
type instance Append '[] ys = ys
type instance Append (x ': xs) ys = x ': (Append xs ys)

hAppend :: HList ts1 -> HList ts2 -> HList (Append ts1 ts2)
hAppend HNil ys = ys
hAppend (HCons x xs) ys = HCons x (hAppend xs ys)

-- implementation above, usage below

example :: HList '[Int,Float,String]
example = HCons 2 (HCons 5.339 (HCons "hello" HNil))

Not only is this code better-looking, it's actually kind-safe! We can't possibly use anything but a promoted value of the promoted type [*] as the argument to HList, nor could we use anything other than Nil or Succ n (for n :: Nat) as the "length" parameter to a DataKinds-based Vector.

My proposal is to allow something similar in Ceylon. This obviously requires GADT support as a prerequisite, but higher-order kinds and first-class type constructors are unnecessary for this purpose, even in Haskell. The paper goes on to prove that DataKinds preserves totality, among other things, so this should not cause any undecidability in Ceylon's type system either.

Aside from that, look at section 3.3 of the paper for an explanation of which types should be valid for promotion, and why.

Section 6.3 discusses kind inference in the context of DataKinds.

gavinking commented 10 years ago

However, as the paper points out, we can easily write Vector Int Float, which is a valid type still, albeit an empty one.

So our implementation doesn't suffer from that problem.

Even in Ceylon, where subtyping saves us from passing Integer in that parameter, we can still pass Nat, which is even worse: that type is inhabited, which means our abstraction is extremely leaky!

Rm, I'm not quite sure if that's fair. The only "hole" that I can see in this implementation is that I could add new subtype of Vec. That's a hole, surely, but it's a hole that is pretty easy to plug. In particular, adding GADTs would plug it. Even without GADTs, we could still let you add that restriction.

Notice that Ceylon can't properly implement HList

Rm, are you sure? WDYM?

(witness the ugliness of unsugared tuple types in Ceylon),

The ugliness is primarily a result of the fact that Tuple is a subtype of List. There's a much simpler implementation of HList in Ceylon, but it wouldn't be Iterable.

let alone hAppend, which can't be implemented in Ceylon at all.

Yes. Indeed. This is frustrating.

So that's why I was commenting above about how Haskell-style type families seem to be very helpful here.

Now, observe how much more understandable and regular the Haskell code becomes as soon as DataKinds is in place:

OK, I don't understand that code yet, so I will keep reading.

pthariensflame commented 10 years ago

Regarding Haskell-style type families, yes, they are extremely helpful, and not just for this kind of thing. They would be very easy to support in Ceylon too: just allow refinement of arbitrary member aliases. You can refine a simple alias by another simple alias, or by an interface alias (refining a simple alias by a class alias might cause some problems, though), as long as the refining one is a subtype of the refined one, as usual for member type refinement.

Regarding the "hole" in your implementation of Vector in current Ceylon, being able to add a new subclass of Vector is indeed easily "plugged", but is not the "leak" I was referring to. Vector<Nat,T> is a valid type, one which is a supertype of the equally valid type Cons<Succ<Nat>,T>; Cons<Nat,T> is a valid type as well. In addition, Succ<Nat> is not only a valid type, but a valid subtype of Nat! Even if you think Vector<Nat,T> or Cons<Nat,T> should be valid for some reason, it's clear that Succ<Nat> is purely an artifact of the "encoding scheme", and is completely meaningless to us. These problems are inherent in the "dynamically kinded" nature of Ceylon's current type system, and there is no possible way around them; you can try it yourself, but you won't be able to fix any of these problems within Ceylon as the language currently stands. We don't have any of these problems on the value level:

shared abstract class Nat() of zero | Succ {}
shared object zero extends Nat() {}
shared abstract class Succ(shared Nat pred) extends Nat() {}

// the following line rightly does not compile
Nat v = Succ(`Nat`);

The fact that we still have them on the type level is the entire motivation for this proposal.

gavinking commented 10 years ago

They would be very easy to support in Ceylon too: just allow refinement of arbitrary member aliases. You can refine a simple alias by another simple alias, or by an interface alias (refining a simple alias by a class alias might cause some problems, though), as long as the refining one is a subtype of the refined one, as usual for member type refinement.

We actually already allow aliases to be refined by aliases, but I don't think the semantics of that is quite what you're looking for. In particular, they're not virtual types.

Vector<Nat,T> is a valid type, one which is a supertype of the equally valid type Cons<Succ<Nat>,T>; Cons<Nat,T> is a valid type as well. In addition, Succ<Nat> is not only a valid type, but a valid subtype of Nat! Even if you think Vector<Nat,T> or Cons<Nat,T> should be valid for some reason, it's clear that Succ<Nat> is purely an artifact of the "encoding scheme", and is completely meaningless to us.

Mmmm. I hear what you're saying. Still, the fact that I can construct meaningless types is to me not that serious of a problem as if we could actually construct a value of thes type, which we can't.

pthariensflame commented 10 years ago

Still, the fact that I can construct meaningless types is to me not that serious of a problem as if we could actually construct a value of thes type, which we can't.

I suppose I can live with that, but kind safety is the whole point of this proposal. If that end is not worth it to you, then this proposal, in its current state at least, is useless.

Haskell-style type families seem to have made an impression on you though, so perhaps we can approach that as a proposal in its own right.

In light of that, I will start a new issue for type families, as a separate feature from this. Feel free to close this issue if you think just DataKinds alone won't really do anything for where you want Ceylon to go. Alternatively, go wherever you want with this idea. :)

gavinking commented 10 years ago

We actually already allow aliases to be refined by aliases

Sorry, to clarify: we allow class aliases to be refined. Here, what you're probably thinking of is allowing alias aliases to be refined, and have the semantics be that of a virtual type.

RossTate commented 10 years ago

So we've already talked about making a special NAT kind with specialized reasoning. The issue with generalizing this to arbitrary "data kinds" is that programs often need a more powerful equational theory at the kind level in order to type check. For example, a program might need to know that Append L Nil is equal to L in order to type check. So when discussing the practicality of this feature you need to look at the actual examples you are interested in and make sure that they still type check with a limited equational theory. Note that this also implies an unprectidability in the language; that is, it's hard to know whether a program that is theoretically type safe will actually be accepted by the type system.

gavinking commented 10 years ago

What I want to know is: what are the other problems relevant to everyday scientific/business computing (not theorem proving!) that dependent type systems solve, apart from typesafe list lengths?

pthariensflame commented 10 years ago

Type-safe list lengths can be generalized to type-safe collection sizes, including maps and sets, as well as type-safe addressing of a map's entries, but aside from that, most of the benefits can be covered by so-called "refinement types" (not related, directly at least, to the structural refinement that is well known in the OO literature). See the various projects under the "liquid types" label (warning: large PDF), of which I'm most familiar with LiquidHaskell; see that page for some other related links and papers.

Full liquid types require an SMT solver, though, so I don't know how much this kind of thing would be worth to you (and Ceylon) in light of that.

RossTate commented 10 years ago

Heh, my friend was the lead student for liquid types, so I'll have to let him know it's getting discussed here. Liquid Types are an inferable form of dependent refinement types. The user specifies the predicate templates they're interested in, and the engine (using a solver) infers conjunctions (ands) of localized instantiations of those predicate templates using whole-program interprocedural analysis. One downside to this is that context affects what refinement types will be inferred, so a program may type check in one setting and fail to type check in another.

ceylon / ceylon-spec

Algebraic type promotion #881