Ahnfelt / funk

PROTOTYPE: A minimal scripting language - OOP via lambda functions and pattern matching
53 stars 2 forks source link

Type System #3

Open evincarofautumn opened 6 years ago

evincarofautumn commented 6 years ago

Here are some thoughts on a type system for Funk—I just took the examples from the slides and wrote some ideas about what their inferred types ought to be (and why). I’m just focusing on static types for now; incorporating dynamic/gradual typing would be a good next step after these details are figured out.

{|x y| x + y} : forall a b r. { "+" : a -> b | r } -> a -> b

y is inferred as a fully polymorphic type a; x is also polymorphic, but constrained to accept a message + which takes an argument of type a and returns a result of type b. The lambda just passes this message along to x, so it also returns a b.

The row polymorphism here (forall r. … { … | r } …) indicates a record that may contain more fields (resp. an object that may accept more messages) than those that are explicitly indicated. This allows for a kind of subtyping: a “larger” type that accepts more messages than just + can be passed for the x parameter. In following examples, I’ll make this polymorphism implicit.

{|x| x + 1} : forall a. { "+" : number -> a } -> a

Here, x is passed the message + with an argument of type number, so its type is constrained accordingly. The result type remains unconstrained. A more concrete type such as number can be passed for the x parameter, in which case the inferred type of this lambda would be constrained further, to number -> number. I use lowercase number instead of Number to differentiate it from Funk’s treatment of uppercase barewords as string symbols, for reasons that will be clear in a moment. It could also be called num or something if you prefer.

This type is also a generic instance of the more concrete type number -> number, and the user might supply that as a type annotation.

{1} : forall a. a -> number

This function does nothing with its argument, so its input type is unconstrained, and it returns a number.

fib := {
  |0| 0
  |1| 1
  |n| fib(n - 1) + fib(n - 2)
} : number -> number

The patterns in fib constrain its input to be of type number; its results, 0 and 1, also constrain its result to number. This could be given a more general type if number literals were polymorphic (as in Haskell) but I’m skipping over that for the sake of simplicity.

color := {
  |Red| "#ff0000"
  |Green| "#00ff00"
  |Blue| "#0000ff"
} : { Red : string, Green : string, Blue : string }
; : (Red -> string) /\ (Green -> string) /\ (Blue -> string)
; : (Red \/ Green \/ Blue) -> string
; : colorName -> string

Here there are a few different options. If Red, Green, and Blue are taken as symbols with singleton types (that is, the type of the value Red is the type Red), then color is inferred to have an “object” type which indicates the result types of the messages it responds to—here, all String. This is essentially an intersection type (Red -> string) /\ (Green -> string) /\ (Blue -> string), denoting an “overloaded” function. (The /\ bikeshed could also be painted & if you prefer.) Because the result types are all the same (string), this can be simplified into a function that takes a union type: (Red \/ Green \/ Blue) -> string or (Red | Green | Blue) -> string. If the type Red \/ Green \/ Blue is known by some other name such as colorName, then it could be further simplified to colorName -> string.

if := {
  |True t _| t()
  |False _ f| f()
} : forall a. { True : forall b. (null -> a) -> b -> a, False : forall c. c -> (null -> a) -> a }
; : forall a. (forall b. True -> (null -> a) -> b -> a) /\ (forall c. False -> c -> (null -> a) -> a)
; : forall a. (True -> (null -> a) -> (null -> a) -> a) /\ (False -> (null -> a) -> (null -> a) -> a)
; : forall a. (True \/ False) -> (null -> a) -> (null -> a) -> a
; : forall a. boolean -> (null -> a) -> (null -> a) -> a

if is similar to color, in that we can infer its type based on the messages it accepts, then simplify it to a more readable type. In the most general case, I think the unused branches could be given fully polymorphic types, but that would leave you with higher-rank quantifiers, which might be a pain to deal with. So instead, the intersection of the two function types can be simplified by unification, making both branches null -> a in both the True and False cases. Then the intersection can be simplified into the union True \/ False, a.k.a. boolean (or bool).

max := {|x y|
  if (x > y) {
    x
  } {
    y
  }
} : forall a. { ">" : a -> boolean } -> a -> boolean

This should be self-explanatory based on previous examples.

when := {
  |True body| body()
  |False _|
} : { True : forall a. (null -> a) -> a, False : forall b. b -> null }
; : (forall a. True -> (null -> a) -> a) /\ (forall b. False -> b -> null }
; : (True -> (null -> null) -> null) /\ (False -> (null -> null) -> null }
; : (True \/ False) -> (null -> null) -> null
; : boolean -> (null -> null) -> null

Because the False branch does nothing, its argument’s type is unconstrained, and (I presume) it returns null. So when the intersection is simplified, the a from the True branch is unified with null, and the b from the False branch is unified with null -> a, giving the type boolean -> (null -> null) -> null.

while := {|condition body|
  when (condition()) {
    body()
    while(condition, body)
  }
} : forall a. (null -> boolean) -> (null -> a) -> null

Here the type of the body’s result is left unconstrained, since it’s never used. The return type of while is inferred as null because that’s the return type of when. (As an aside, for optimisation purposes, the compiler needs to recognise that the call to while is in tail position.)

Now we get into the interesting stuff: objects!

vector := {|x y| {
  |X| x
  |Y| y
  |Add v| vector(x + v.X, y + v.Y)
}}

The inferred type for this would be quite gnarly if allowed to be fully general. This is the best I could come up with:

forall a b addA addB.
  ; The types “a” and “b” of the fields must accept “+” messages…
  (a <: { "+" : addA -> a }, b <: { "+" : addB -> b })
  => mu self. a -> b
  ; …that can accept the types of the fields “X” and “Y” of the value passed to “Add”.
  -> { X : a, Y : b, Add : { X : addA, Y : addB } -> self }

This can’t easily be simplified, so a user-supplied type annotation would be needed in situations like this to produce a readable type:

number -> number -> mu self. { X : number, Y : number, Add : { X : number, Y : number } -> self }

The notation for recursive types could be spelled differently, too, like rec self; or self could implicitly refer to the nearest containing {…} type. And type aliases could be introduced for convenience:

type vector = mu self. {
  X : number  ; NB. newline could be accepted where comma is expected
  Y : number
  Add : { X : number, Y : number } -> self
}

newVector : number -> number -> vector

Record union captures the notion of forwarding messages in the inheritance example:

type monster := {
  Name : string
  Hurt : number -> null
}

newMonster := {|name hitpoints|
  |Name| name
  |Hurt damage| hitpoints -= damage
}} : string -> number -> monster

type creeper := { Explode : number -> null } + monster

newCreeper := {
  super := newMonster("Creeper", 80)
  {
    |Explode area|
      area.NearbyMonsters.Each {|monster|
        monster.Hurt(50)
      }
    |otherMethod|
      super(otherMethod)
  }
} : null -> creeper

The “sum types” example works out pretty neatly too:

getOrElse := {|option default|
  option {
    |None| default
    |Some value| value
  }
} : forall a. ({ None : a, Some : a -> a } -> a) -> a -> a

You could always add “real” sum & product types if you want, rather than encoding them like this—it might be better for producing good error messages and checking whether pattern matching is exhaustive.

Extensible Records with Scoped Labels is a good starting point for implementing extensible records—it’s the basis of the record system in PureScript and the effect system in Kitten. If you want to incorporate higher-rank types, lexi-lambda did a reference implementation of Complete and Easy Bidirectional Typechecking for Higher-rank Polymorphism; even if you don’t need higher-rank polymorphism, bidirectional typechecking is a good technique that tends to produce better error messages than vanilla Hindley–Milner.

That’s all I’ve got for now—let me know your thoughts. :)

Ahnfelt commented 6 years ago

It looks great - thank you! I'll post a longer reply as soon as I have some time to sit down and study it a bit more (tonight or early tomorrow).

Ahnfelt commented 6 years ago

Thank you so much for the long and detailed proposal. It seems like a powerful type system and one that would be a great fit for Funk.

I'm a little concerned with the power of the type system; I've implemented simpler type systems before, but never with equirecursive types and higher ranked types.

To me, it seems like quite the task!

Well formed types

To test if I understand the type system correctly, here's what I gather that the type grammar looks like:

t   = "String" 
    | "Number" 
    | "{" STRING ":" t "}" 
    | t "->" t 
    | t "&" t 
    | t "|" t 
    | t "+" t 
    | "mu" LOWER "." t
    | "forall" LOWER "." t
    | LOWER

Withforall a b. t as syntactic sugar for forall a. forall b. t. STRING are strings like Foo or "Bar" and UPPER and LOWER are alphanumeric identifiers starting with an upper or lower case letter respectively. I'm using &, | here for intersection and union for compatibility with typescript, flow, etc.

Record types

Thank you for the pointer on implementing extensible records. It seems like a natural fit.

Do I understand it correctly if I say that {Foo : t1, Bar : t2 | t3} is syntactic sugar for {Foo : t1} + {Bar : t2} + t3?

Why is there both union and record union in the type system? Is it because we need a left biased union for records, so that it matches the "first matching case wins" semantics for pattern matching?

Recursive types

I gather that the following example is an equirecursive type?

type vector = mu self. {
  X : number  ; NB. newline could be accepted where comma is expected
  Y : number
  Add : { X : number, Y : number } -> self
}

I'm not very familiar with type checking of equirecursive types, but I can see how they make it possible to construct a structural type for self-referential objects:

vector := {|x y| {
  |X| x
  |Y| y
  |Add v| vector(x + v.X, y + v.Y)
}} : number -> number -> mu self. { X : number, Y : number, Add : { X : number, Y : number } -> self}

As far as I recall, checking equivalence of equirecursive types requires rewriting them to a canonical form first. It feels like the sort of thing that's hard to combine with other type system features. Do you know if they pose problems for type inference?

Higher rank polymorphism and bidirectional type checking

Thank you for the pointer on implementing higher rank polymorphism with bidirectional type checking. Bidirectional type checking is my go-to implementation strategy, so that'll definitely come in handy!

Ahnfelt commented 6 years ago

Disregarding sum types for now - are and union and intersection types still relevant? The type gammar is a bit simpler without it:

t   = "String" 
    | "Number" 
    | "{" STRING ":" t ("," STRING ":" t)* ("|" t)? "}" 
    | t "->" t 
    | "mu" LOWER "." t
    | "forall" LOWER "." t
    | LOWER
evincarofautumn commented 6 years ago

I don’t think you actually need unions, intersections, or higher-ranked quantifiers—I was using them as as justification for how to simplify function types. If a union type like Red \/ Green \/ Blue would show up, it could be implemented as an extensible variant (basically the same as a record as far as the type system is concerned) where the keys are Red &c. and the values are unit/null; however, you would need a subtyping rule allowing (bareword) strings to be promoted to variant keys.

As for recursive types, you might be able to get away with isorecursive types with some rules for automatically inserting roll and unroll. I’m not certain about this. The only place they should show up is when a lambda returns the result of calling itself; when a record is in argument position I’ve been assuming it’ll be inferred structurally unless the user gives a type annotation with the nominal type. I’ve also been assuming that nominal types are just synonyms, but they could be treated “opaquely” like newtypes.

You can also make forall quantifiers implicit as in Haskell 98 if you like. Same goes for mu as I mentioned earlier. Ignoring all the syntactic sugar like forall a b.forall a. forall b. and { A: b, C : d }{ A : b | { C : d | r } }, your internal type grammar might be:

type ::= LOWER  -- constructor or variable
       | "{" "}"  -- empty row
       | "{" STRING ":" type "|" type "}"  -- row extension
       | type type  -- type application
       | "mu" LOWER "." type  -- quantified types (internal)
       | "forall" LOWER "." type

(When variants are involved, Leijen uses ⦇…⦈ ((| … |)?) for row types, {…} for record constructors, and ⟨…⟩ (<| … |>?) for variant constructors.)

Function types and built-in types like number and string are often special-cased because they’re so common, but it’s not a requirement—you can represent functions like ((->) a) b. Also, it’s convenient to pair type variables and constructors with their inferred kinds internally.

evincarofautumn commented 6 years ago

I’ve realised that for an object with many methods, type signatures could travel far away from the thing they’re describing:

{
  |A| …
  |B| …
  |C| …
  |D| …
  …
} : {
  A : …
  B : …
  C : …
  D : …
  …
}

So it’d be good to consider some sugar for attaching type annotations directly to methods.

Ahnfelt commented 6 years ago

That seems like the right grammar for types. Perhaps with uppercase identifiers for type constructors and lowercase identifiers for type parameters (and mu-bound types). This would allow leaving out the forall entirely for rank 1 types, as in Haskell.

To prevent the traveling of type annotations, one option would be to allow them after patterns:

vector := {|x y| {
    |X : Int| x
    |Y : Int| y
    |Add v : Vector -> Vector| vector(x + v.X, y + v.Y)
}}

But there's a problem with where to introduce mu vector. syntactically here, which I've conveniently side-stepped by hand-waving a Vector type, whatever it is.

Another option would be to define the type elsewhere. It corresponds to an interface:

Vector := {
    X : Number 
    Y : Number 
    Add : Vector -> Vector
}

vector : Number -> Number -> Vector = {|x y| {
    |X| x
    |Y| y
    |Add v| vector(x + v.X, y + v.Y)
}}

Note that Vector in this example could well be a nominal type, at least for the purpose of isorecursive type equivalence. However, as far as I can see, isorecursive types prevent type inference for vector, since there's no name for the isorecursive type system to compare if you remove the type definition. I don't think it's a big deal not to be able to infer recursive types in this example, but there may be other examples that I haven't thought of.

I should note that the := syntax is chosen to allow for a type between : and =, which is how it's used in the above example.

Ahnfelt commented 6 years ago

The latter approach would then have this grammar:

type ::= LOWER  -- variable
       | UPPER -- constructor
       | "{" "}"
       | "{" STRING ":" type "|" type "}"
       | type type
       | "forall" LOWER "." type

I also think polymorphic components is a good alternative to rank N types. It's pretty much what Java and similar languages do, if you consider methods fields. It restricts forall to occur in rank 1 positions and in field types, eg:

Array := forall a. {
    Map : forall b. (a -> b) -> Array b 
}

Syntactically, the forall can be omitted, since their placement is obvious. I think that yields some very friendly syntax:

Array a := {
    Map : (a -> b) -> Array b 
}

Polymorphic fields can express the same as rank N types, at the cost of packing / unpacking of records for the more complicated types. In Haskell, they are especially neat due to the global field names, which makes type inference complete for them. That's not going to work in Funk, but I personally don't mind if type inference is mostly top-to-bottom.

That leaves this grammar:

type ::= LOWER  -- variable
       | UPPER -- constructor
       | "{" "}"
       | "{" STRING ":" scheme "|" type "}"
       | type type

scheme ::= type 
       | "forall" LOWER "." scheme  -- internal