Ad-hoc polymorphism is a type system feature which allows a single symbol to have multiple interpretations. There are two sub-categories of ad-hoc polymorphism: overloading and coercions. Overloading means a function can have different implementations based on the type of its input parameters and output return value. Coercion instead means the function only has a single implementation, but its input arguments can be implicitly converted such that they are compatible with that function.
OCaml
Some languages like OCaml does not have ad-hoc polymorphism. This means the type-system is syntax-directed such that every symbol can only mean one thing. For example + is used for integer addition, +. is used for floating point addition.
let x = 1 + 2
let y = 1.0 +. 2.0
In my opinion, this is a good choice in OCaml since the language is mainly meant to be used for compiler development, where universal polymorphism (parametric polymorphism and subtyping) is more useful.
Motivation
This RFC is about adding support for overloading (not coercions) in Arc-Script. Overloading might be the most important component of the frontend language. In data science, this feature will for example allow the addition operator to work with vectors and matrices. Moreover, we can have different specialised implementations of operators that could depend on type information, but perhaps also on other statically inferred information. Also interestingly, implementations of overloaded functions can be imported from outside of the language (from Rust) which adds another level of specialisation.
Haskell
Haskell distinguishes itself from OCaml by allowing ad-hoc polymorphism through type classes. A type class is basically a set of functions which can be implemented for a particular type. Haskell requires that there is no ambiguity where multiple classes in the same namespace contain functions that share the same name. In other words, given a function name it must always be possible to identify which type class that function belongs to. This means Haskell's type system has the principal types property. Every well typed expression has exactly one type.
class MyClass1 a where
do_thing :: a -> a -> a
class MyClass2 a where
do_thing :: a -> a -> a
instance MyClass1 Int where
do_thing a b = a + b
test = do_thing 1 2 -- ERROR: Ambiguous call
Principal types get rid of ambiguity which could confuse the programmer, but they also introduce some restrictions. In general I think they are good to have.
Rust
Rust distinguishes itself from Haskell by first not having global type inference. This is as far as I know a consequence of subtyping properties of lifetimes in the borrow-checker, but also a design decision since Rust is a systems language which wants to be explicit rather than implicit. Rust also does not have principal types such that a well typed expression can have multiple types.
Rust throws an ambiguity error if it finds multiple solutions in its trait resolution.
Go-lang
Go unlike Ocaml/Haskell/Rust does not have parametric polymorphism, which means every function can take exactly one type of input and produce only one type of output. Go however has ad-hoc polymorphism in the form of method syntax. Users can define methods which share the same method-name for different types. The neat thing is that methods do not need an interface associated with them. You can instead directly attach the method to the type itself.
package main
import (
"fmt"
"math"
)
type Vertex1 struct {
X, Y float64
}
type Vertex2 struct {
X, Y float64
}
func (v Vertex1) Abs() float64 {
return math.Sqrt(v.X*v.X + v.Y*v.Y)
}
func (v Vertex2) Abs() float64 {
return math.Sqrt(v.X*v.X + v.Y*v.Y)
}
func main() {
v1 := Vertex1{3, 4}
v2 := Vertex2{3, 4}
fmt.Println(v1.Abs())
fmt.Println(v2.Abs())
}
Scala
Scala takes a slightly different approach compared to Haskell and Rust. In Scala, type classes are implemented using implicits. An implicit is basically an argument to a function which does not need to be explicitly inserted into the parameter list. Instead, it is implicitly inferred from its context. In an object-oriented language like Scala, implicits are objects which contain methods, and can to this end be used to implement type classes and traits.
trait MyClass[A] {
def do_thing(a: A, b: A): A;
}
implicit val my_instance: MyClass[Int] = new MyClass[Int] {
def do_thing(a: Int, b: Int): Int = a + b
}
def test: Int = do_thing(1, 2)
Clean
Clean is a different take on Haskell and allows you to write code like:
class (+) a :: a a -> a
class zero a :: a
then you can do:
sum :: [a] -> a | zero, + a
sum [] = zero
sum [x:xs] = x + sum xs
The type | trait syntax means similar to Rust's:
fn sum<a: zero + (+)>(xs: [a]) -> a {...}
Arc-Script
In Arc-Script I would be interested in exploring if it's possible to take the path of Clean by combining Haskell's and Go's approach. In other words you have global type inference, parametric polymorphism, principal types, and method syntax (implicit traits). This means it should be fairly easy to overload things without having to constantly define new traits.
Below is a proposed syntax. To make it more readable, I changed the fun keyword to def:
def do_thing(Self, Self): Self;
# Overload for i32
def do_thing(a, b) for i32 {
a + b
}
# Overload for f32
def do_thing(a, b) for f32 {
a + b
}
# Fully annotated overload for i64
def do_thing(a: i64, b: i64): i64 for i64 {
a + b
}
# Fully annotated, with Self type-alias, overload for f64
def do_thing(a: Self, b: Self): Self for f64 {
a + b
}
# Extern overload for str
extern def do_thing(a, b) for str;
# Overload for array type constructor
def do_thing[T: do_thing](a, b) for [T] {
a.zip(b).map(fun(a, b): a.do_thing(b))
}
def test() {
val a: i32 = 1;
val b: i32 = 2;
a.do_thing(b);
val c: f32 = 1.0;
val d: f32 = 2.0;
c.do_thing(d);
}
To explain, a function can have a Self type which is the type for which the function can be overloaded. If a function has a declaration with a Self type, then definitions of that function can overload the declaration for different types. The declaration of the function must be fully type annotated, but annotating definitions is optional. The declaration may contain type parameters, but those must all be instantiated in the definition. However, the definition could contain more type parameters which may also contain bounds.
Note that it's possible to define a function for a generic type, the two following are equivalent:
def identity(Self): Self;
def identity[T](x) for T {
x
}
# Equivalent to the following
def identity[T](x: T): T {
x
}
Can in some cases be restrictive, in particular, I don't see a good way to put a generic type in only the declaration vs. only the definition.
In Rust, you can put generics in both the declaration and the implementation:
trait Foo<A> {
fn bar(self, qux: A) -> Self;
}
impl<B> Foo<B> for B {
fn bar(self, qux: B) -> B {
if zot() { self } else { qux }
}
}
Generic A is introduced by the declaration, and generic B is introduced by the implementation. Then we instantiate A to B.
Problem
In our case, we would like to have:
def foo[A](Self, A): Self;
def foo[B](self, qux: B) for i32 { x }
But then the problem is that we never instantiate A.
Option 1
One option could be to turn Self into an explicit generic such as:
def foo[A, B](A, B): A;
def foo[C](a: i32, b: C) { a }
To get the function signature, we need to instantiate A and B to fresh type variables '0 and '1, and unify(i32, '0) and unify(C, '1). The caveat might be that we need to annotate function definitions.
Option 2
Another option is to explicitly instantiate foo such as:
def foo[A, B](A, B): Self;
def foo[C](a, b) for [i32, C] { a }
Option 3
Introduce an impl construct like:
def foo[A, B](A, B): Self;
impl[C] foo[i32, C](a, b) { a }
Option 4
Probably the best option is to support standard Haskell type classes and then provide a desugaring from bare functions into them.
Implementation
The way to implement type classes as described by user MBones on Discord:
I assume you're familiar with the general ideas of type inference via unification, where instead of taking two complete types and checking that they're equal, you take two types with holes in them and try to fill in the holes so that they're equal (unification), and hopefully by the time you're done checking a function, all the holes have been filled in
Unification is great at solving equality constraints, but typeclasses add new kinds of constraints that can't be solved right away. So three extra things:
that checks if a type (possibly with holes) satisfies a trait. It will probably be a recursive function, since like to check Vec<T>: Eq needs to then check T: Eq
Delaying constraints: but what if you need like ?X: Eq, but ?X is still a unification variable/hole? You need some way of associating holes with traits they will need to satisfy. For Haskell98 style typeclasses, this can be as simple as
Ad-hoc polymorphism is a type system feature which allows a single symbol to have multiple interpretations. There are two sub-categories of ad-hoc polymorphism: overloading and coercions. Overloading means a function can have different implementations based on the type of its input parameters and output return value. Coercion instead means the function only has a single implementation, but its input arguments can be implicitly converted such that they are compatible with that function.
OCaml
Some languages like OCaml does not have ad-hoc polymorphism. This means the type-system is syntax-directed such that every symbol can only mean one thing. For example
+
is used for integer addition,+.
is used for floating point addition.In my opinion, this is a good choice in OCaml since the language is mainly meant to be used for compiler development, where universal polymorphism (parametric polymorphism and subtyping) is more useful.
Motivation
This RFC is about adding support for overloading (not coercions) in Arc-Script. Overloading might be the most important component of the frontend language. In data science, this feature will for example allow the addition operator to work with vectors and matrices. Moreover, we can have different specialised implementations of operators that could depend on type information, but perhaps also on other statically inferred information. Also interestingly, implementations of overloaded functions can be imported from outside of the language (from Rust) which adds another level of specialisation.
Haskell
Haskell distinguishes itself from OCaml by allowing ad-hoc polymorphism through type classes. A type class is basically a set of functions which can be implemented for a particular type. Haskell requires that there is no ambiguity where multiple classes in the same namespace contain functions that share the same name. In other words, given a function name it must always be possible to identify which type class that function belongs to. This means Haskell's type system has the principal types property. Every well typed expression has exactly one type.
Principal types get rid of ambiguity which could confuse the programmer, but they also introduce some restrictions. In general I think they are good to have.
Rust
Rust distinguishes itself from Haskell by first not having global type inference. This is as far as I know a consequence of subtyping properties of lifetimes in the borrow-checker, but also a design decision since Rust is a systems language which wants to be explicit rather than implicit. Rust also does not have principal types such that a well typed expression can have multiple types.
Rust throws an ambiguity error if it finds multiple solutions in its trait resolution.
Go-lang
Go unlike Ocaml/Haskell/Rust does not have parametric polymorphism, which means every function can take exactly one type of input and produce only one type of output. Go however has ad-hoc polymorphism in the form of method syntax. Users can define methods which share the same method-name for different types. The neat thing is that methods do not need an interface associated with them. You can instead directly attach the method to the type itself.
Scala
Scala takes a slightly different approach compared to Haskell and Rust. In Scala, type classes are implemented using implicits. An implicit is basically an argument to a function which does not need to be explicitly inserted into the parameter list. Instead, it is implicitly inferred from its context. In an object-oriented language like Scala, implicits are objects which contain methods, and can to this end be used to implement type classes and traits.
Clean
Clean is a different take on Haskell and allows you to write code like:
then you can do:
The
type | trait
syntax means similar to Rust's:Arc-Script
In Arc-Script I would be interested in exploring if it's possible to take the path of Clean by combining Haskell's and Go's approach. In other words you have global type inference, parametric polymorphism, principal types, and method syntax (implicit traits). This means it should be fairly easy to overload things without having to constantly define new traits.
Below is a proposed syntax. To make it more readable, I changed the
fun
keyword todef
:To explain, a function can have a
Self
type which is the type for which the function can be overloaded. If a function has a declaration with aSelf
type, then definitions of that function can overload the declaration for different types. The declaration of the function must be fully type annotated, but annotating definitions is optional. The declaration may contain type parameters, but those must all be instantiated in the definition. However, the definition could contain more type parameters which may also contain bounds.Note that it's possible to define a function for a generic type, the two following are equivalent:
In Rust, the above could be implemented as:
Update: 8 Nov
After some thought, the syntax:
Can in some cases be restrictive, in particular, I don't see a good way to put a generic type in only the declaration vs. only the definition.
In Rust, you can put generics in both the declaration and the implementation:
Generic
A
is introduced by the declaration, and genericB
is introduced by the implementation. Then we instantiateA
toB
.Problem
In our case, we would like to have:
But then the problem is that we never instantiate
A
.Option 1
One option could be to turn
Self
into an explicit generic such as:To get the function signature, we need to instantiate
A
andB
to fresh type variables'0
and'1
, andunify(i32, '0)
andunify(C, '1)
. The caveat might be that we need to annotate function definitions.Option 2
Another option is to explicitly instantiate
foo
such as:Option 3
Introduce an
impl
construct like:Option 4
Probably the best option is to support standard Haskell type classes and then provide a desugaring from bare functions into them.
Implementation
The way to implement type classes as described by user
MBones
on Discord: