This issue proposes a Dart language feature which takes inspiration from the Haskell typeclass feature: typeclasses. It enables polymorphic behavior, not unlike instance method invocation for regular objects, but the concrete behavior is associated with multiple actual type arguments, which makes it somewhat similar to multi-methods.
The feature is also somewhat related to extension methods, because it is concerned with the ability to support invocations of functions using the same syntax as that of regular instance methods, but based on functions that are declared outside the statically known class of the receiver. Another similarity is that the specific implementation of a typeclass which is used for a given call site can be chosen based on a global search for applicable typeclass instances (just like we'd use a global search to find an applicable extension declaration when we're resolving an extension method invocation). However, extension methods are always resolved statically, but typeclass methods can also be resolved at run time.
This issue presents the basic ideas and some motivation. In particular, it does not propose a precise syntax (so don't worry about ambiguities in the grammar or specific choices of keywords, those things can be scrutinized later).
Typeclasses: Motivation and Outline of Feature
In some cases, software will bring together two or more entities with some variations. For instance, we could have several different sources of information, and we might want to send the information to several different receivers (sinks).
In order to write generic code that transfers information from the former to the latter without depending on the exact kind of source or the exact kind of sink, we could insist that every source of information must implement the same interface, and every sink must implement the same interface, and then we can just implement the transfer in terms of those interfaces.
However, this would almost inevitably require the source of information to encode that information in a common format which would then be accepted by the sink. In other words, the information would be wrapped and unwrapped in some sense, and the relationship between the source and the sink would be ignored. So, perhaps the source and the sink are working on the exact same kind of data, but the wrapping/unwrapping will still take place.
The feature proposed here is intended to handle these multi-party, multi-type interactions differently, allowing for specialized behaviors associated with specific combinations of types.
First, let's introduce a couple of helper classes, such that we can model this source/sink scenario using simple types like double and String:
This typeclass specifies that it is possible to invoke operator << on an instance of Sink<X> with an operand of type Y, that is sx << y where sx has type Sink<X> and y has type Y. The intuition is that this will insert y into the sink sx.
In general, a typeclass can contain member declarations which are similar to class instance member declarations, except that they specify the type R of the receiver using R.name where an ordinary class instance member would just have name.
However, it is only possible to do this in the case where an instance of the typeclass has been obtained. Such an instance will have a specific list of type arguments, that is, it will choose a specific value for the X and Y type parameters, and it will contain a concrete declaration of the members of the typeclass.
Here are some instance declarations for the typeclass Insert:
The instance of Insert<X, X> handles the situation where the source uses the same type as the sink. In that case we can simply call add on the sink. This instance declaration can be used to obtain an instance of Insert<T, T> for any type T. Note that it does not encode or decode the information at all; this is possible because it is known inside this instance that we're dealing with the case where the two type parameters of the typeclass are the same type.
The instance of Insert<String, double> and the instance of Insert<double, String> will handle the situation where the source delivers a String and the sink needs a double, respectively the source delivers a double and the sink needs a String. This illustrates that we can have distinct implementations of the typeclass for specific combinations of actual type arguments of the typeclass.
The instance of Insert<X, Iterable<Y>> can only be used when the precondition is satisfied: We must have access to an instance of Insert<X, Y>. In other words, if we already have an implementation of the typeclass that knows how to insert a Y into a sink of Xthen we can build an instance that will insert an Iterable<Y> into a sink of X.
Here are a couple of examples where the typeclass member is invoked:
void doInsert<X, Y, Z>(Sink<X> sx, Y y, Z z, Iterable<Y> ys)
where Insert<X, Y>, Insert<X, Z> {
sx << y << z << ys;
}
void main() {
var ss = SinkString();
var sd = SinkDouble();
ss << 'Hello, world' << 2.0 << ['Here', 'we come!'];
doInsert(sd, 'Hello, world', 2.0, ['Here', 'we come!']);
}
Execution of the example program will produce the following output:
"Hello, world"
"2.0000"
"Here"
"we come!"
NaN
2.00
NaN
NaN
The first four lines are produced by inserting data into the String sink, and the last four lines are using the double sink (where strings that don't parse to a double value yield double.nan). Here's the reason for that behavior:
In main we're creating a Sink<String> named ss and a Sink<double> named sd, and then we're inserting various kinds of information into them. The operator << associates to the left, and it returns the receiver, and this enables sequences of insertions by means of sequences of code of the form << e.
In particular, ss << 'Hello, world' will use the Instance<X, X> with the actual type argument String. Next, we're effectively evaluating ss << 2.0, and in this case we're using the Instance<String, double>. Finally, ss << ['Here', 'we come!'] is using an instance of Instance<X, Iterable<Y>> where X is String and Y is String, and that instance was built on top of an Instance<X, X> with actual type argument String.
This illustrates the case where each invocation of the typeclass operator << relies on the selection of a suitable typeclass instance in a way that resembles extension methods: We search the enclosing scopes to find every instance declaration that declares an operator << (and the search may continue recursively in cases like Insert<X, Iterable<Y>> which needs an Insert<X, Y>), and then we select the most specific typeclass instance (or report an error if none of them is most specific).
The invocation of doInsert illustrates that the choice of typeclass instance can be dynamic: The caller of doInsert must provide a typeclass instance, such that the where Insert<X, Y> clause is satisfied. But in the body of doInsert we will use that typeclass instance in order to evaluate sx << y. Similarly, doInsert also requires an instance of Insert<X, Z> which is used for sx << z. Finally, the given Instance<X, Y> is used to construct an Instance<X, Iterable<Y>>, which is used for sx << ys.
Consider an invocation r.m(a) which turns out to not be a class instance method invocation, and not an extension method invocation, such that it needs to be resolved as a typeclass method invocation (or it could just be an error). In this case, type inference is performed on r without a context type, just like other constructs which have the syntactic shape of an instance method invocation.
Assuming that an instance method invocation and an extension method invocation has been ruled out, resolution of a typeclass member invocation r.m(args) is performed as follows:
Step 1 (use an existing typeclass instance from scope). Let tc1 .. tck be the typeclass instances provided in enclosing scopes by where clauses, such that each typeclass instance has a member whose basename is the basename of m. A compile-time error occurs if there exist i and j, such that tci and tcj are instances of different typeclasses. In other words, a typeclass member from scope must uniquely determine the typeclass by its basename. Use TC(r).m(args) to disambiguate, where TC is the desired typeclass.
Let tc1 .. tcm be the subset of tc1 .. tck where the static type of r is a subtype of the declared receiver type of m in the given typeclass. With S R.m(...), the declared receiver type is R1, where R1 is obtained from R by substituting the actual type arguments of the instance for the formal type parameters of the typeclass. If there exist distinct i and j such that the given invocation r.m(args) is well-typed with both the signature from tci and from tcj a compile-time error occurs. In other words, a typeclass member from scope must uniquely determine its instance by the types of the receiver, and the arguments, and the context type, if any. If there exists exactly one i such that the invocation is well-typed with the signature from tci, then tci is selected. Otherwise no signature is well-typed, and we skip to step 2.
Step 2 (create a suitable typeclass instance). TO DO.
The overall affordance offered by the typeclass feature is that we can add new members to various types (the receiver type of a typeclass method can be any type, including any type that contains one or more type variables declared by the typeclass), and the provision of instances allow us to have different implementations for different combinations of actual type arguments to the typeclass. Moreover, call sites can abstract over typeclass instances: for instance, the body of doInsert uses typeclass methods, but it is not statically known at the call site which typeclass instance it is using.
So we get to extend sets of classes with additional members (extensibility, similarly to extension methods or views) and we get to run the actual implementation polymorphically (using object-oriented dispatch, similarly to regular instance members), and this enhances the expressive power of Dart when it comes to multi-type collaborations.
So how does this work?
The semantics of typeclasses and their instances can be illustrated quite well by a desugaring step (which is a viable implementation technique as well):
All classes mentioned below will have a constant constructor with the name of the class.
A typeclass declaration is desugared into an abstract class declaration, containing the members of the typeclass.
A declaration of a typeclass instance is desugared into a class that implements the desugaring of the typeclass. Each typeclass depended upon (as specified in the where clause) will be stored in a final instance variable v, initialized by a constructor argument this.v.
Each member has its signature "shifted" such that the receiver in the original code becomes the first positional argument, and remaining arguments are shifted accordingly. This implies that a typeclass getter will be desugared into a method taking one argument, and all typeclass operators will be desugared into methods with corresponding names. So R T.operator <<(S s) is desugared into R operator$ShiftLeft(T self, S s), and explicit as well as implicit references to this in the method body, if any, are desugared into self (and some other fresh name is used if self is already defined in the scope).
During analysis/compilation of any given member access, e.g., r.m(a), typeclass methods are available in way that resembles extension methods: If the static type of r does not have a member named m, we search for applicable extension methods and for applicable typeclass methods. A compile-time error occurs if there is no extension method, but there are typeclass methods, and none is most specific. Otherwise, the most specific typeclass method is selected, and the invocation is modified such that the given typeclass instance tci is used as the receiver and the syntactic receiver is passed as the first argument, tci.m(r,a). If m is an operator, the corresponding method name is used.
It would probably be useful to have an explicit form for the invocation of a typeclass method. With typeclass A we could then express r.m(a) as A(r).m(a), which would specify the requirement that we find an applicable instance of the typeclass A. Given that the typeclass name (with a prefix, if needed) is resolved statically, this construct should not give rise to added complexity.
It is even possible to require that an invocation that gives rise to the creation of a new typeclass instance must use the explicit invocation. One benefit obtained by having this rule would be that typeclass members are only invoked (1) when an enclosing scope has a where clause, in which case we can use the associated typeclass instance; or (2) when there is an explicit typeclass member invocation. The obvious drawback would be that the code is more verbose.
A typeclass instance My$Instance<T1, T2> is essentially a container that holds several functions (corresponding to the declared members of the typeclass), and the syntactic receiver type will become the type of the first positional parameter of the desugared method. This means that dynamically checked covariance will occur not just frequently but actually for every single typeclass member. For that reason it would probably be useful to make every typeclass type parameter invariant, even in the case where sound declaration-site variance is not yet supported in general by the language.
Typeclass member invocations are rewritten at the call site, and hence there is essentially no way those invocations could ever be performed dynamically. This is not considered to be a problem. For instance, extension methods have the same property.
As mentioned, this feature shares some properties with extension methods, but it also introduces additional expressive power: Extension methods cannot offer multiple possible implementations for multiple combinations of type variables. Also, extension declarations are parametric in their type parameters: Any value of any given type parameter is supported, as long as the actual type arguments satisfy the declared bounds. In contrast, a typeclass only supports any given actual type argument list in the case where a typeclass instance with a suitable type is available (based on a where clause of some enclosing declaration), or some instance declaration can be used to obtain an instance with the required type arguments (possible recursively, if the instance has a where clause as well). These two mechanisms are so different that it seems unlikely that we would want to unify them.
The typeclass feature has a similar overlap with views: View members are always resolved statically, and hence they differ from typeclass members in much the same way as extension methods differ from typeclass members.
Views support a zero-cost property in the sense that they allow an object o to be treated almost as if it had a different type T, without wrapping o in a new object that implements T. Similarly, extension methods are resolved statically, which is zero-cost in the same sense.
Typeclass invocations could be resolved statically in some cases (like ss << .. << .. in main), but in general (including the invocations in doInsert), a typeclass member invocation is an invocation of a method on an object (namely the object that represents the given instance of the typeclass), and this is just as expensive as any other instance member invocation; also, doInsert illustrates that the typeclass instance objects must in some cases be created based on other objects at run time, which implies that not all typeclass instance objects can be constant. In that case the typeclass invocation would involve the creation of a typeclass instance object (potentially a whole tree of such objects, according to the where dependencies), plus the instance method invocation.
Versions
April 5th 2022, 1.0: First public version.
April 7th 2022, 1.1: Add a sketch of the rules about selection of typeclass instances from scope.
This issue proposes a Dart language feature which takes inspiration from the Haskell typeclass feature: typeclasses. It enables polymorphic behavior, not unlike instance method invocation for regular objects, but the concrete behavior is associated with multiple actual type arguments, which makes it somewhat similar to multi-methods.
The feature is also somewhat related to extension methods, because it is concerned with the ability to support invocations of functions using the same syntax as that of regular instance methods, but based on functions that are declared outside the statically known class of the receiver. Another similarity is that the specific implementation of a typeclass which is used for a given call site can be chosen based on a global search for applicable typeclass instances (just like we'd use a global search to find an applicable extension declaration when we're resolving an extension method invocation). However, extension methods are always resolved statically, but typeclass methods can also be resolved at run time.
This issue presents the basic ideas and some motivation. In particular, it does not propose a precise syntax (so don't worry about ambiguities in the grammar or specific choices of keywords, those things can be scrutinized later).
Typeclasses: Motivation and Outline of Feature
In some cases, software will bring together two or more entities with some variations. For instance, we could have several different sources of information, and we might want to send the information to several different receivers (sinks).
In order to write generic code that transfers information from the former to the latter without depending on the exact kind of source or the exact kind of sink, we could insist that every source of information must implement the same interface, and every sink must implement the same interface, and then we can just implement the transfer in terms of those interfaces.
However, this would almost inevitably require the source of information to encode that information in a common format which would then be accepted by the sink. In other words, the information would be wrapped and unwrapped in some sense, and the relationship between the source and the sink would be ignored. So, perhaps the source and the sink are working on the exact same kind of data, but the wrapping/unwrapping will still take place.
The feature proposed here is intended to handle these multi-party, multi-type interactions differently, allowing for specialized behaviors associated with specific combinations of types.
First, let's introduce a couple of helper classes, such that we can model this source/sink scenario using simple types like
double
andString
:Next, we introduce a typeclass which will enable the transfer of information of type
Y
into a sink ofX
:This typeclass specifies that it is possible to invoke operator
<<
on an instance ofSink<X>
with an operand of typeY
, that issx << y
wheresx
has typeSink<X>
andy
has typeY
. The intuition is that this will inserty
into the sinksx
.In general, a typeclass can contain member declarations which are similar to class instance member declarations, except that they specify the type
R
of the receiver usingR.name
where an ordinary class instance member would just havename
.However, it is only possible to do this in the case where an instance of the typeclass has been obtained. Such an instance will have a specific list of type arguments, that is, it will choose a specific value for the
X
andY
type parameters, and it will contain a concrete declaration of the members of the typeclass.Here are some instance declarations for the typeclass
Insert
:The instance of
Insert<X, X>
handles the situation where the source uses the same type as the sink. In that case we can simply calladd
on the sink. This instance declaration can be used to obtain an instance ofInsert<T, T>
for any typeT
. Note that it does not encode or decode the information at all; this is possible because it is known inside this instance that we're dealing with the case where the two type parameters of the typeclass are the same type.The instance of
Insert<String, double>
and the instance ofInsert<double, String>
will handle the situation where the source delivers aString
and the sink needs adouble
, respectively the source delivers adouble
and the sink needs aString
. This illustrates that we can have distinct implementations of the typeclass for specific combinations of actual type arguments of the typeclass.The instance of
Insert<X, Iterable<Y>>
can only be used when the precondition is satisfied: We must have access to an instance ofInsert<X, Y>
. In other words, if we already have an implementation of the typeclass that knows how to insert aY
into a sink ofX
then we can build an instance that will insert anIterable<Y>
into a sink ofX
.Here are a couple of examples where the typeclass member is invoked:
Execution of the example program will produce the following output:
The first four lines are produced by inserting data into the
String
sink, and the last four lines are using thedouble
sink (where strings that don't parse to adouble
value yielddouble.nan
). Here's the reason for that behavior:In
main
we're creating aSink<String>
namedss
and aSink<double>
namedsd
, and then we're inserting various kinds of information into them. The operator<<
associates to the left, and it returns the receiver, and this enables sequences of insertions by means of sequences of code of the form<< e
.In particular,
ss << 'Hello, world'
will use theInstance<X, X>
with the actual type argumentString
. Next, we're effectively evaluatingss << 2.0
, and in this case we're using theInstance<String, double>
. Finally,ss << ['Here', 'we come!']
is using an instance ofInstance<X, Iterable<Y>>
whereX
isString
andY
isString
, and that instance was built on top of anInstance<X, X>
with actual type argumentString
.This illustrates the case where each invocation of the typeclass operator
<<
relies on the selection of a suitable typeclass instance in a way that resembles extension methods: We search the enclosing scopes to find everyinstance
declaration that declares an operator<<
(and the search may continue recursively in cases likeInsert<X, Iterable<Y>>
which needs anInsert<X, Y>
), and then we select the most specific typeclass instance (or report an error if none of them is most specific).The invocation of
doInsert
illustrates that the choice of typeclass instance can be dynamic: The caller ofdoInsert
must provide a typeclass instance, such that thewhere Insert<X, Y>
clause is satisfied. But in the body ofdoInsert
we will use that typeclass instance in order to evaluatesx << y
. Similarly,doInsert
also requires an instance ofInsert<X, Z>
which is used forsx << z
. Finally, the givenInstance<X, Y>
is used to construct anInstance<X, Iterable<Y>>
, which is used forsx << ys
.Consider an invocation
r.m(a)
which turns out to not be a class instance method invocation, and not an extension method invocation, such that it needs to be resolved as a typeclass method invocation (or it could just be an error). In this case, type inference is performed onr
without a context type, just like other constructs which have the syntactic shape of an instance method invocation.Assuming that an instance method invocation and an extension method invocation has been ruled out, resolution of a typeclass member invocation
r.m(args)
is performed as follows:Step 1 (use an existing typeclass instance from scope). Let tc1 .. tck be the typeclass instances provided in enclosing scopes by
where
clauses, such that each typeclass instance has a member whose basename is the basename ofm
. A compile-time error occurs if there exist i and j, such that tci and tcj are instances of different typeclasses. In other words, a typeclass member from scope must uniquely determine the typeclass by its basename. UseTC(r).m(args)
to disambiguate, whereTC
is the desired typeclass.Let tc1 .. tcm be the subset of tc1 .. tck where the static type of
r
is a subtype of the declared receiver type ofm
in the given typeclass. WithS R.m(...)
, the declared receiver type isR1
, whereR1
is obtained fromR
by substituting the actual type arguments of the instance for the formal type parameters of the typeclass. If there exist distinct i and j such that the given invocationr.m(args)
is well-typed with both the signature from tci and from tcj a compile-time error occurs. In other words, a typeclass member from scope must uniquely determine its instance by the types of the receiver, and the arguments, and the context type, if any. If there exists exactly one i such that the invocation is well-typed with the signature from tci, then tci is selected. Otherwise no signature is well-typed, and we skip to step 2.Step 2 (create a suitable typeclass instance). TO DO.
The overall affordance offered by the typeclass feature is that we can add new members to various types (the receiver type of a typeclass method can be any type, including any type that contains one or more type variables declared by the typeclass), and the provision of instances allow us to have different implementations for different combinations of actual type arguments to the typeclass. Moreover, call sites can abstract over typeclass instances: for instance, the body of
doInsert
uses typeclass methods, but it is not statically known at the call site which typeclass instance it is using.So we get to extend sets of classes with additional members (extensibility, similarly to extension methods or views) and we get to run the actual implementation polymorphically (using object-oriented dispatch, similarly to regular instance members), and this enhances the expressive power of Dart when it comes to multi-type collaborations.
So how does this work?
The semantics of typeclasses and their instances can be illustrated quite well by a desugaring step (which is a viable implementation technique as well):
where
clause) will be stored in a final instance variablev
, initialized by a constructor argumentthis.v
.R T.operator <<(S s)
is desugared intoR operator$ShiftLeft(T self, S s)
, and explicit as well as implicit references tothis
in the method body, if any, are desugared intoself
(and some other fresh name is used ifself
is already defined in the scope).During analysis/compilation of any given member access, e.g.,
r.m(a)
, typeclass methods are available in way that resembles extension methods: If the static type ofr
does not have a member namedm
, we search for applicable extension methods and for applicable typeclass methods. A compile-time error occurs if there is no extension method, but there are typeclass methods, and none is most specific. Otherwise, the most specific typeclass method is selected, and the invocation is modified such that the given typeclass instancetci
is used as the receiver and the syntactic receiver is passed as the first argument,tci.m(r,a)
. Ifm
is an operator, the corresponding method name is used.For the example, we get the following desugaring:
Discussion
It would probably be useful to have an explicit form for the invocation of a typeclass method. With typeclass
A
we could then expressr.m(a)
asA(r).m(a)
, which would specify the requirement that we find an applicable instance of the typeclassA
. Given that the typeclass name (with a prefix, if needed) is resolved statically, this construct should not give rise to added complexity.It is even possible to require that an invocation that gives rise to the creation of a new typeclass instance must use the explicit invocation. One benefit obtained by having this rule would be that typeclass members are only invoked (1) when an enclosing scope has a
where
clause, in which case we can use the associated typeclass instance; or (2) when there is an explicit typeclass member invocation. The obvious drawback would be that the code is more verbose.A typeclass instance
My$Instance<T1, T2>
is essentially a container that holds several functions (corresponding to the declared members of the typeclass), and the syntactic receiver type will become the type of the first positional parameter of the desugared method. This means that dynamically checked covariance will occur not just frequently but actually for every single typeclass member. For that reason it would probably be useful to make every typeclass type parameter invariant, even in the case where sound declaration-site variance is not yet supported in general by the language.Typeclass member invocations are rewritten at the call site, and hence there is essentially no way those invocations could ever be performed dynamically. This is not considered to be a problem. For instance, extension methods have the same property.
As mentioned, this feature shares some properties with extension methods, but it also introduces additional expressive power: Extension methods cannot offer multiple possible implementations for multiple combinations of type variables. Also, extension declarations are parametric in their type parameters: Any value of any given type parameter is supported, as long as the actual type arguments satisfy the declared bounds. In contrast, a typeclass only supports any given actual type argument list in the case where a typeclass instance with a suitable type is available (based on a
where
clause of some enclosing declaration), or someinstance
declaration can be used to obtain an instance with the required type arguments (possible recursively, if the instance has awhere
clause as well). These two mechanisms are so different that it seems unlikely that we would want to unify them.The typeclass feature has a similar overlap with views: View members are always resolved statically, and hence they differ from typeclass members in much the same way as extension methods differ from typeclass members.
Views support a zero-cost property in the sense that they allow an object
o
to be treated almost as if it had a different typeT
, without wrappingo
in a new object that implementsT
. Similarly, extension methods are resolved statically, which is zero-cost in the same sense.Typeclass invocations could be resolved statically in some cases (like
ss << .. << ..
inmain
), but in general (including the invocations indoInsert
), a typeclass member invocation is an invocation of a method on an object (namely the object that represents the given instance of the typeclass), and this is just as expensive as any other instance member invocation; also,doInsert
illustrates that the typeclass instance objects must in some cases be created based on other objects at run time, which implies that not all typeclass instance objects can be constant. In that case the typeclass invocation would involve the creation of a typeclass instance object (potentially a whole tree of such objects, according to thewhere
dependencies), plus the instance method invocation.Versions
April 5th 2022, 1.0: First public version. April 7th 2022, 1.1: Add a sketch of the rules about selection of typeclass instances from scope.