Open pthariensflame opened 10 years ago
@pthariensflame Before I get too deep into this paper, what should I be looking for here, from our point of view?
The paper talks a lot about Haskell's kind system, and its limitations. That's not quite relevant to us because:
Nat
and Vector
.See this gist for how we can represent Nat
and Vector
in Ceylon.
So the thing I really did notice upfront is that Haskell's type families let you express the Plus
function for Nat
s, which is really where we run aground right now. This is why I've toyed with the idea with essentially engineering Plus
and Pred
as primitive functions in the type system. But perhaps type families are a way to avoid that?
What else am I supposed to be paying attention to?
Really, the paper is talking about integrating directly into Haskell's kind system a feature that could already be (mostly) emulated (notice that it mentions the Strathclyde Haskell Enhancement, which (among other things) did this for you with some simple preprocessing). The benefit of the system that the paper talks about is that, just like Ceylon, Haskell values type safety very highly, but Haskell's type system wasn't (and Ceylon's type system isn't) kind-safe in the same way that both languages' value systems are type-safe. The paper proposes a system that is both easy to implement and easy to understand, and yet gets you most of the kind-safety of full-spectrum dependent typing.
Vector
is a pretty standard academic example of an easy dependently-typed thing, but it's so easy as to be essentially trivial to emulate in Haskell, even without DataKinds
:
{-# LANGUAGE EmptyDataDecls, GADTs, UndecidableInstances, FlexibleInstances, TypeFamilies #-}
data Zero
data Succ n
data Vector n a where
Nil :: Vector Zero
Cons :: a -> Vector n a -> Vector (Succ n) a
type family Plus m n
type instance Plus Zero n = n
type instance Plus (Succ m) n = Succ (Plus m n)
append :: Vector m a -> Vector n a -> Vector (Plus m n) a
append Nil ys = ys
append (Cons x xs) = Cons x (append xs ys)
head :: Vector (Succ n) a -> a
head (Cons x _) = x
tail :: Vector (Succ n) a -> Vector n a
tail (Cons _ xs) = xs
However, as the paper points out, we can easily write Vector Int Float
, which is a valid type still, albeit an empty one. The kind system provides no help for us here, essentially making our type-level Nat
"dynamically kinded". Even in Ceylon, where subtyping saves us from passing Integer
in that parameter, we can still pass Nat
, which is even worse: that type is inhabited, which means our abstraction is extremely leaky! If we enjoy and promote the use of type safety on the value level, it seems only reasonable to do the same for kind safety on the type level.
Also, consider the sheer ugliness in Haskell without DataKinds
when implementing and using HList
, the heterogeneous list type:
{-# LANGUAGE EmptyDataDecls, GADTs, UndecidableInstances, FlexibleInstances, TypeFamilies #-}
data Nil
data Cons x xs
data HList ts where
HNil :: HList Nil
HCons :: t -> HList ts -> HList (Cons t ts)
type family Append xs ys
type instance Append Nil ys = ys
type instance Append (Cons x xs) ys = Cons x (Append xs ys)
hAppend :: HList ts1 -> HList ts2 -> HList (Append ts1 ts2)
hAppend HNil ys = ys
hAppend (HCons x xs) ys = HCons x (hAppend xs ys)
-- implementation above, usage below
example :: HList (Cons Int (Cons Float (Cons String Nil)))
example = HCons 2 (HCons 5.339 (HCons "hello" HNil))
Notice that Ceylon can't properly implement HList
(witness the ugliness of unsugared tuple types in Ceylon), let alone hAppend
, which can't be implemented in Ceylon at all. In addition, this code is still "dynamically kinded" in the way I described above.
Now, observe how much more understandable and regular the Haskell code becomes as soon as DataKinds
is in place:
{-# LANGUAGE DataKinds, GADTs, KindSignatures, TypeFamilies, FlexibleInstances, UndecidableInstances #-}
data HList (ts :: [*]) where
HNil :: HList '[]
HCons :: t -> HList ts -> HList (t ': ts)
type family Append (xs :: [*]) (ys :: [*]) :: [*]
type instance Append '[] ys = ys
type instance Append (x ': xs) ys = x ': (Append xs ys)
hAppend :: HList ts1 -> HList ts2 -> HList (Append ts1 ts2)
hAppend HNil ys = ys
hAppend (HCons x xs) ys = HCons x (hAppend xs ys)
-- implementation above, usage below
example :: HList '[Int,Float,String]
example = HCons 2 (HCons 5.339 (HCons "hello" HNil))
Not only is this code better-looking, it's actually kind-safe! We can't possibly use anything but a promoted value of the promoted type [*]
as the argument to HList
, nor could we use anything other than Nil
or Succ n
(for n :: Nat
) as the "length" parameter to a DataKinds
-based Vector
.
My proposal is to allow something similar in Ceylon. This obviously requires GADT support as a prerequisite, but higher-order kinds and first-class type constructors are unnecessary for this purpose, even in Haskell. The paper goes on to prove that DataKinds
preserves totality, among other things, so this should not cause any undecidability in Ceylon's type system either.
Aside from that, look at section 3.3 of the paper for an explanation of which types should be valid for promotion, and why.
Section 6.3 discusses kind inference in the context of DataKinds
.
However, as the paper points out, we can easily write
Vector Int Float
, which is a valid type still, albeit an empty one.
So our implementation doesn't suffer from that problem.
Even in Ceylon, where subtyping saves us from passing Integer in that parameter, we can still pass Nat, which is even worse: that type is inhabited, which means our abstraction is extremely leaky!
Rm, I'm not quite sure if that's fair. The only "hole" that I can see in this implementation is that I could add new subtype of Vec
. That's a hole, surely, but it's a hole that is pretty easy to plug. In particular, adding GADTs would plug it. Even without GADTs, we could still let you add that restriction.
Notice that Ceylon can't properly implement
HList
Rm, are you sure? WDYM?
(witness the ugliness of unsugared tuple types in Ceylon),
The ugliness is primarily a result of the fact that Tuple
is a subtype of List
. There's a much simpler implementation of HList
in Ceylon, but it wouldn't be Iterable
.
let alone
hAppend
, which can't be implemented in Ceylon at all.
Yes. Indeed. This is frustrating.
So that's why I was commenting above about how Haskell-style type families seem to be very helpful here.
Now, observe how much more understandable and regular the Haskell code becomes as soon as
DataKinds
is in place:
OK, I don't understand that code yet, so I will keep reading.
Regarding Haskell-style type families, yes, they are extremely helpful, and not just for this kind of thing. They would be very easy to support in Ceylon too: just allow refinement of arbitrary member alias
es. You can refine a simple alias
by another simple alias
, or by an interface
alias (refining a simple alias
by a class
alias might cause some problems, though), as long as the refining one is a subtype of the refined one, as usual for member type refinement.
Regarding the "hole" in your implementation of Vector
in current Ceylon, being able to add a new subclass of Vector is indeed easily "plugged", but is not the "leak" I was referring to. Vector<Nat,T>
is a valid type, one which is a supertype of the equally valid type Cons<Succ<Nat>,T>
; Cons<Nat,T>
is a valid type as well. In addition, Succ<Nat>
is not only a valid type, but a valid subtype of Nat
! Even if you think Vector<Nat,T>
or Cons<Nat,T>
should be valid for some reason, it's clear that Succ<Nat>
is purely an artifact of the "encoding scheme", and is completely meaningless to us. These problems are inherent in the "dynamically kinded" nature of Ceylon's current type system, and there is no possible way around them; you can try it yourself, but you won't be able to fix any of these problems within Ceylon as the language currently stands. We don't have any of these problems on the value level:
shared abstract class Nat() of zero | Succ {}
shared object zero extends Nat() {}
shared abstract class Succ(shared Nat pred) extends Nat() {}
// the following line rightly does not compile
Nat v = Succ(`Nat`);
The fact that we still have them on the type level is the entire motivation for this proposal.
They would be very easy to support in Ceylon too: just allow refinement of arbitrary member
alias
es. You can refine a simplealias
by another simplealias
, or by aninterface
alias (refining a simplealias
by aclass
alias might cause some problems, though), as long as the refining one is a subtype of the refined one, as usual for member type refinement.
We actually already allow aliases to be refined by aliases, but I don't think the semantics of that is quite what you're looking for. In particular, they're not virtual types.
Vector<Nat,T>
is a valid type, one which is a supertype of the equally valid typeCons<Succ<Nat>,T>
;Cons<Nat,T>
is a valid type as well. In addition,Succ<Nat>
is not only a valid type, but a valid subtype ofNat
! Even if you thinkVector<Nat,T>
orCons<Nat,T>
should be valid for some reason, it's clear thatSucc<Nat>
is purely an artifact of the "encoding scheme", and is completely meaningless to us.
Mmmm. I hear what you're saying. Still, the fact that I can construct meaningless types is to me not that serious of a problem as if we could actually construct a value of thes type, which we can't.
Still, the fact that I can construct meaningless types is to me not that serious of a problem as if we could actually construct a value of thes type, which we can't.
I suppose I can live with that, but kind safety is the whole point of this proposal. If that end is not worth it to you, then this proposal, in its current state at least, is useless.
Haskell-style type families seem to have made an impression on you though, so perhaps we can approach that as a proposal in its own right.
In light of that, I will start a new issue for type families, as a separate feature from this. Feel free to close this issue if you think just DataKinds
alone won't really do anything for where you want Ceylon to go. Alternatively, go wherever you want with this idea. :)
We actually already allow aliases to be refined by aliases
Sorry, to clarify: we allow class aliases to be refined. Here, what you're probably thinking of is allowing alias
aliases to be refined, and have the semantics be that of a virtual type.
So we've already talked about making a special NAT
kind with specialized reasoning. The issue with generalizing this to arbitrary "data kinds" is that programs often need a more powerful equational theory at the kind level in order to type check. For example, a program might need to know that Append L Nil
is equal to L
in order to type check. So when discussing the practicality of this feature you need to look at the actual examples you are interested in and make sure that they still type check with a limited equational theory. Note that this also implies an unprectidability in the language; that is, it's hard to know whether a program that is theoretically type safe will actually be accepted by the type system.
What I want to know is: what are the other problems relevant to everyday scientific/business computing (not theorem proving!) that dependent type systems solve, apart from typesafe list lengths?
Type-safe list lengths can be generalized to type-safe collection sizes, including maps and sets, as well as type-safe addressing of a map's entries, but aside from that, most of the benefits can be covered by so-called "refinement types" (not related, directly at least, to the structural refinement that is well known in the OO literature). See the various projects under the "liquid types" label (warning: large PDF), of which I'm most familiar with LiquidHaskell; see that page for some other related links and papers.
Full liquid types require an SMT solver, though, so I don't know how much this kind of thing would be worth to you (and Ceylon) in light of that.
Heh, my friend was the lead student for liquid types, so I'll have to let him know it's getting discussed here. Liquid Types are an inferable form of dependent refinement types. The user specifies the predicate templates they're interested in, and the engine (using a solver) infers conjunctions (ands) of localized instantiations of those predicate templates using whole-program interprocedural analysis. One downside to this is that context affects what refinement types will be inferred, so a program may type check in one setting and fail to type check in another.
I mentioned this potential feature in this issue under the name
DataKinds
; the standard paper describing it is here. Note that the paper describes both type promotion and kind polymorphism, the latter of which is not part of this proposal for Ceylon (although if you want it as well as type promotion, then by all means go for it :) ).