Open evincarofautumn opened 7 years ago
It looks great - thank you! I'll post a longer reply as soon as I have some time to sit down and study it a bit more (tonight or early tomorrow).
Thank you so much for the long and detailed proposal. It seems like a powerful type system and one that would be a great fit for Funk.
I'm a little concerned with the power of the type system; I've implemented simpler type systems before, but never with equirecursive types and higher ranked types.
To me, it seems like quite the task!
To test if I understand the type system correctly, here's what I gather that the type grammar looks like:
t = "String"
| "Number"
| "{" STRING ":" t "}"
| t "->" t
| t "&" t
| t "|" t
| t "+" t
| "mu" LOWER "." t
| "forall" LOWER "." t
| LOWER
Withforall a b. t
as syntactic sugar for forall a. forall b. t
. STRING are strings like Foo
or "Bar"
and UPPER and LOWER are alphanumeric identifiers starting with an upper or lower case letter respectively. I'm using &
, |
here for intersection and union for compatibility with typescript, flow, etc.
Thank you for the pointer on implementing extensible records. It seems like a natural fit.
Do I understand it correctly if I say that {Foo : t1, Bar : t2 | t3}
is syntactic sugar for {Foo : t1} + {Bar : t2} + t3
?
Why is there both union and record union in the type system? Is it because we need a left biased union for records, so that it matches the "first matching case wins" semantics for pattern matching?
I gather that the following example is an equirecursive type?
type vector = mu self. {
X : number ; NB. newline could be accepted where comma is expected
Y : number
Add : { X : number, Y : number } -> self
}
I'm not very familiar with type checking of equirecursive types, but I can see how they make it possible to construct a structural type for self-referential objects:
vector := {|x y| {
|X| x
|Y| y
|Add v| vector(x + v.X, y + v.Y)
}} : number -> number -> mu self. { X : number, Y : number, Add : { X : number, Y : number } -> self}
As far as I recall, checking equivalence of equirecursive types requires rewriting them to a canonical form first. It feels like the sort of thing that's hard to combine with other type system features. Do you know if they pose problems for type inference?
Thank you for the pointer on implementing higher rank polymorphism with bidirectional type checking. Bidirectional type checking is my go-to implementation strategy, so that'll definitely come in handy!
Disregarding sum types for now - are and union and intersection types still relevant? The type gammar is a bit simpler without it:
t = "String"
| "Number"
| "{" STRING ":" t ("," STRING ":" t)* ("|" t)? "}"
| t "->" t
| "mu" LOWER "." t
| "forall" LOWER "." t
| LOWER
I don’t think you actually need unions, intersections, or higher-ranked quantifiers—I was using them as as justification for how to simplify function types. If a union type like Red \/ Green \/ Blue
would show up, it could be implemented as an extensible variant (basically the same as a record as far as the type system is concerned) where the keys are Red
&c. and the values are unit/null
; however, you would need a subtyping rule allowing (bareword) strings to be promoted to variant keys.
As for recursive types, you might be able to get away with isorecursive types with some rules for automatically inserting roll
and unroll
. I’m not certain about this. The only place they should show up is when a lambda returns the result of calling itself; when a record is in argument position I’ve been assuming it’ll be inferred structurally unless the user gives a type annotation with the nominal type. I’ve also been assuming that nominal types are just synonyms, but they could be treated “opaquely” like newtype
s.
You can also make forall
quantifiers implicit as in Haskell 98 if you like. Same goes for mu
as I mentioned earlier. Ignoring all the syntactic sugar like forall a b.
→ forall a. forall b.
and { A: b, C : d }
→ { A : b | { C : d | r } }
, your internal type grammar might be:
type ::= LOWER -- constructor or variable
| "{" "}" -- empty row
| "{" STRING ":" type "|" type "}" -- row extension
| type type -- type application
| "mu" LOWER "." type -- quantified types (internal)
| "forall" LOWER "." type
(When variants are involved, Leijen uses ⦇…⦈ ((| … |)
?) for row types, {…} for record constructors, and ⟨…⟩ (<| … |>
?) for variant constructors.)
Function types and built-in types like number
and string
are often special-cased because they’re so common, but it’s not a requirement—you can represent functions like ((->) a) b
. Also, it’s convenient to pair type variables and constructors with their inferred kinds internally.
I’ve realised that for an object with many methods, type signatures could travel far away from the thing they’re describing:
{
|A| …
|B| …
|C| …
|D| …
…
} : {
A : …
B : …
C : …
D : …
…
}
So it’d be good to consider some sugar for attaching type annotations directly to methods.
That seems like the right grammar for types. Perhaps with uppercase identifiers for type constructors and lowercase identifiers for type parameters (and mu-bound types). This would allow leaving out the forall
entirely for rank 1 types, as in Haskell.
To prevent the traveling of type annotations, one option would be to allow them after patterns:
vector := {|x y| {
|X : Int| x
|Y : Int| y
|Add v : Vector -> Vector| vector(x + v.X, y + v.Y)
}}
But there's a problem with where to introduce mu vector.
syntactically here, which I've conveniently side-stepped by hand-waving a Vector
type, whatever it is.
Another option would be to define the type elsewhere. It corresponds to an interface:
Vector := {
X : Number
Y : Number
Add : Vector -> Vector
}
vector : Number -> Number -> Vector = {|x y| {
|X| x
|Y| y
|Add v| vector(x + v.X, y + v.Y)
}}
Note that Vector
in this example could well be a nominal type, at least for the purpose of isorecursive type equivalence. However, as far as I can see, isorecursive types prevent type inference for vector
, since there's no name for the isorecursive type system to compare if you remove the type definition. I don't think it's a big deal not to be able to infer recursive types in this example, but there may be other examples that I haven't thought of.
I should note that the :=
syntax is chosen to allow for a type between :
and =
, which is how it's used in the above example.
The latter approach would then have this grammar:
type ::= LOWER -- variable
| UPPER -- constructor
| "{" "}"
| "{" STRING ":" type "|" type "}"
| type type
| "forall" LOWER "." type
I also think polymorphic components is a good alternative to rank N types. It's pretty much what Java and similar languages do, if you consider methods fields. It restricts forall to occur in rank 1 positions and in field types, eg:
Array := forall a. {
Map : forall b. (a -> b) -> Array b
}
Syntactically, the forall can be omitted, since their placement is obvious. I think that yields some very friendly syntax:
Array a := {
Map : (a -> b) -> Array b
}
Polymorphic fields can express the same as rank N types, at the cost of packing / unpacking of records for the more complicated types. In Haskell, they are especially neat due to the global field names, which makes type inference complete for them. That's not going to work in Funk, but I personally don't mind if type inference is mostly top-to-bottom.
That leaves this grammar:
type ::= LOWER -- variable
| UPPER -- constructor
| "{" "}"
| "{" STRING ":" scheme "|" type "}"
| type type
scheme ::= type
| "forall" LOWER "." scheme -- internal
Here are some thoughts on a type system for Funk—I just took the examples from the slides and wrote some ideas about what their inferred types ought to be (and why). I’m just focusing on static types for now; incorporating dynamic/gradual typing would be a good next step after these details are figured out.
y
is inferred as a fully polymorphic typea
;x
is also polymorphic, but constrained to accept a message+
which takes an argument of typea
and returns a result of typeb
. The lambda just passes this message along tox
, so it also returns ab
.The row polymorphism here (
forall r. … { … | r } …
) indicates a record that may contain more fields (resp. an object that may accept more messages) than those that are explicitly indicated. This allows for a kind of subtyping: a “larger” type that accepts more messages than just+
can be passed for thex
parameter. In following examples, I’ll make this polymorphism implicit.Here,
x
is passed the message+
with an argument of typenumber
, so its type is constrained accordingly. The result type remains unconstrained. A more concrete type such asnumber
can be passed for thex
parameter, in which case the inferred type of this lambda would be constrained further, tonumber -> number
. I use lowercasenumber
instead ofNumber
to differentiate it from Funk’s treatment of uppercase barewords as string symbols, for reasons that will be clear in a moment. It could also be callednum
or something if you prefer.This type is also a generic instance of the more concrete type
number -> number
, and the user might supply that as a type annotation.This function does nothing with its argument, so its input type is unconstrained, and it returns a
number
.The patterns in
fib
constrain its input to be of typenumber
; its results,0
and1
, also constrain its result tonumber
. This could be given a more general type if number literals were polymorphic (as in Haskell) but I’m skipping over that for the sake of simplicity.Here there are a few different options. If
Red
,Green
, andBlue
are taken as symbols with singleton types (that is, the type of the valueRed
is the typeRed
), thencolor
is inferred to have an “object” type which indicates the result types of the messages it responds to—here, allString
. This is essentially an intersection type(Red -> string) /\ (Green -> string) /\ (Blue -> string)
, denoting an “overloaded” function. (The/\
bikeshed could also be painted&
if you prefer.) Because the result types are all the same (string
), this can be simplified into a function that takes a union type:(Red \/ Green \/ Blue) -> string
or(Red | Green | Blue) -> string
. If the typeRed \/ Green \/ Blue
is known by some other name such ascolorName
, then it could be further simplified tocolorName -> string
.if
is similar tocolor
, in that we can infer its type based on the messages it accepts, then simplify it to a more readable type. In the most general case, I think the unused branches could be given fully polymorphic types, but that would leave you with higher-rank quantifiers, which might be a pain to deal with. So instead, the intersection of the two function types can be simplified by unification, making both branchesnull -> a
in both theTrue
andFalse
cases. Then the intersection can be simplified into the unionTrue \/ False
, a.k.a.boolean
(orbool
).This should be self-explanatory based on previous examples.
Because the
False
branch does nothing, its argument’s type is unconstrained, and (I presume) it returnsnull
. So when the intersection is simplified, thea
from theTrue
branch is unified withnull
, and theb
from theFalse
branch is unified withnull -> a
, giving the typeboolean -> (null -> null) -> null
.Here the type of the body’s result is left unconstrained, since it’s never used. The return type of
while
is inferred asnull
because that’s the return type ofwhen
. (As an aside, for optimisation purposes, the compiler needs to recognise that the call towhile
is in tail position.)Now we get into the interesting stuff: objects!
The inferred type for this would be quite gnarly if allowed to be fully general. This is the best I could come up with:
This can’t easily be simplified, so a user-supplied type annotation would be needed in situations like this to produce a readable type:
The notation for recursive types could be spelled differently, too, like
rec self
; orself
could implicitly refer to the nearest containing{…}
type. And type aliases could be introduced for convenience:Record union captures the notion of forwarding messages in the inheritance example:
The “sum types” example works out pretty neatly too:
You could always add “real” sum & product types if you want, rather than encoding them like this—it might be better for producing good error messages and checking whether pattern matching is exhaustive.
Extensible Records with Scoped Labels is a good starting point for implementing extensible records—it’s the basis of the record system in PureScript and the effect system in Kitten. If you want to incorporate higher-rank types, lexi-lambda did a reference implementation of Complete and Easy Bidirectional Typechecking for Higher-rank Polymorphism; even if you don’t need higher-rank polymorphism, bidirectional typechecking is a good technique that tends to produce better error messages than vanilla Hindley–Milner.
That’s all I’ve got for now—let me know your thoughts. :)