dart-lang / language

Design of the Dart language
Other
2.62k stars 198 forks source link

Dart typeclasses #2194

Open eernstg opened 2 years ago

eernstg commented 2 years ago

This issue proposes a Dart language feature which takes inspiration from the Haskell typeclass feature: typeclasses. It enables polymorphic behavior, not unlike instance method invocation for regular objects, but the concrete behavior is associated with multiple actual type arguments, which makes it somewhat similar to multi-methods.

The feature is also somewhat related to extension methods, because it is concerned with the ability to support invocations of functions using the same syntax as that of regular instance methods, but based on functions that are declared outside the statically known class of the receiver. Another similarity is that the specific implementation of a typeclass which is used for a given call site can be chosen based on a global search for applicable typeclass instances (just like we'd use a global search to find an applicable extension declaration when we're resolving an extension method invocation). However, extension methods are always resolved statically, but typeclass methods can also be resolved at run time.

This issue presents the basic ideas and some motivation. In particular, it does not propose a precise syntax (so don't worry about ambiguities in the grammar or specific choices of keywords, those things can be scrutinized later).

Typeclasses: Motivation and Outline of Feature

In some cases, software will bring together two or more entities with some variations. For instance, we could have several different sources of information, and we might want to send the information to several different receivers (sinks).

In order to write generic code that transfers information from the former to the latter without depending on the exact kind of source or the exact kind of sink, we could insist that every source of information must implement the same interface, and every sink must implement the same interface, and then we can just implement the transfer in terms of those interfaces.

However, this would almost inevitably require the source of information to encode that information in a common format which would then be accepted by the sink. In other words, the information would be wrapped and unwrapped in some sense, and the relationship between the source and the sink would be ignored. So, perhaps the source and the sink are working on the exact same kind of data, but the wrapping/unwrapping will still take place.

The feature proposed here is intended to handle these multi-party, multi-type interactions differently, allowing for specialized behaviors associated with specific combinations of types.

First, let's introduce a couple of helper classes, such that we can model this source/sink scenario using simple types like double and String:

class SinkString implements Sink<String> {
  void add(String s) => print('"$s"');
  void close() {}
}

class SinkDouble implements Sink<double> {
  void add(double d) => print(d.toStringAsFixed(2));
  void close() {}
}

Next, we introduce a typeclass which will enable the transfer of information of type Y into a sink of X:

typeclass Insert<X, Y> {
  Sink<X> Sink<X>.operator <<(Y y);
}

This typeclass specifies that it is possible to invoke operator << on an instance of Sink<X> with an operand of type Y, that is sx << y where sx has type Sink<X> and y has type Y. The intuition is that this will insert y into the sink sx.

In general, a typeclass can contain member declarations which are similar to class instance member declarations, except that they specify the type R of the receiver using R.name where an ordinary class instance member would just have name.

However, it is only possible to do this in the case where an instance of the typeclass has been obtained. Such an instance will have a specific list of type arguments, that is, it will choose a specific value for the X and Y type parameters, and it will contain a concrete declaration of the members of the typeclass.

Here are some instance declarations for the typeclass Insert:

instance<X> Insert<X, X> {
  Sink<X> Sink<X>.operator <<(X x) => this..add(x);
}

instance Insert<String, double> {
  Sink<String> Sink<String>.operator <<(double d) =>
      this..add(d.toStringAsFixed(4));
}

instance Insert<double, String> {
  Sink<double> Sink<double>.operator <<(String s) =>
      this..add(double.tryParse(s) ?? double.nan);
}

instance<X, Y> Insert<X, Iterable<Y>> where Insert<X, Y> {
  Sink<X> Sink<X>.operator <<(Iterable<Y> ys) {
    for (var y in ys) {
      this << y;
    }
    return this;
  }
}

The instance of Insert<X, X> handles the situation where the source uses the same type as the sink. In that case we can simply call add on the sink. This instance declaration can be used to obtain an instance of Insert<T, T> for any type T. Note that it does not encode or decode the information at all; this is possible because it is known inside this instance that we're dealing with the case where the two type parameters of the typeclass are the same type.

The instance of Insert<String, double> and the instance of Insert<double, String> will handle the situation where the source delivers a String and the sink needs a double, respectively the source delivers a double and the sink needs a String. This illustrates that we can have distinct implementations of the typeclass for specific combinations of actual type arguments of the typeclass.

The instance of Insert<X, Iterable<Y>> can only be used when the precondition is satisfied: We must have access to an instance of Insert<X, Y>. In other words, if we already have an implementation of the typeclass that knows how to insert a Y into a sink of X then we can build an instance that will insert an Iterable<Y> into a sink of X.

Here are a couple of examples where the typeclass member is invoked:

void doInsert<X, Y, Z>(Sink<X> sx, Y y, Z z, Iterable<Y> ys)
    where Insert<X, Y>, Insert<X, Z> {
  sx << y << z << ys;
}

void main() {
  var ss = SinkString();
  var sd = SinkDouble();
  ss << 'Hello, world' << 2.0 << ['Here', 'we come!'];
  doInsert(sd, 'Hello, world', 2.0, ['Here', 'we come!']);
}

Execution of the example program will produce the following output:

"Hello, world"
"2.0000"
"Here"
"we come!"
NaN
2.00
NaN
NaN

The first four lines are produced by inserting data into the String sink, and the last four lines are using the double sink (where strings that don't parse to a double value yield double.nan). Here's the reason for that behavior:

In main we're creating a Sink<String> named ss and a Sink<double> named sd, and then we're inserting various kinds of information into them. The operator << associates to the left, and it returns the receiver, and this enables sequences of insertions by means of sequences of code of the form << e.

In particular, ss << 'Hello, world' will use the Instance<X, X> with the actual type argument String. Next, we're effectively evaluating ss << 2.0, and in this case we're using the Instance<String, double>. Finally, ss << ['Here', 'we come!'] is using an instance of Instance<X, Iterable<Y>> where X is String and Y is String, and that instance was built on top of an Instance<X, X> with actual type argument String.

This illustrates the case where each invocation of the typeclass operator << relies on the selection of a suitable typeclass instance in a way that resembles extension methods: We search the enclosing scopes to find every instance declaration that declares an operator << (and the search may continue recursively in cases like Insert<X, Iterable<Y>> which needs an Insert<X, Y>), and then we select the most specific typeclass instance (or report an error if none of them is most specific).

The invocation of doInsert illustrates that the choice of typeclass instance can be dynamic: The caller of doInsert must provide a typeclass instance, such that the where Insert<X, Y> clause is satisfied. But in the body of doInsert we will use that typeclass instance in order to evaluate sx << y. Similarly, doInsert also requires an instance of Insert<X, Z> which is used for sx << z. Finally, the given Instance<X, Y> is used to construct an Instance<X, Iterable<Y>>, which is used for sx << ys.

Consider an invocation r.m(a) which turns out to not be a class instance method invocation, and not an extension method invocation, such that it needs to be resolved as a typeclass method invocation (or it could just be an error). In this case, type inference is performed on r without a context type, just like other constructs which have the syntactic shape of an instance method invocation.

Assuming that an instance method invocation and an extension method invocation has been ruled out, resolution of a typeclass member invocation r.m(args) is performed as follows:

Step 1 (use an existing typeclass instance from scope). Let tc1 .. tck be the typeclass instances provided in enclosing scopes by where clauses, such that each typeclass instance has a member whose basename is the basename of m. A compile-time error occurs if there exist i and j, such that tci and tcj are instances of different typeclasses. In other words, a typeclass member from scope must uniquely determine the typeclass by its basename. Use TC(r).m(args) to disambiguate, where TC is the desired typeclass.

Let tc1 .. tcm be the subset of tc1 .. tck where the static type of r is a subtype of the declared receiver type of m in the given typeclass. With S R.m(...), the declared receiver type is R1, where R1 is obtained from R by substituting the actual type arguments of the instance for the formal type parameters of the typeclass. If there exist distinct i and j such that the given invocation r.m(args) is well-typed with both the signature from tci and from tcj a compile-time error occurs. In other words, a typeclass member from scope must uniquely determine its instance by the types of the receiver, and the arguments, and the context type, if any. If there exists exactly one i such that the invocation is well-typed with the signature from tci, then tci is selected. Otherwise no signature is well-typed, and we skip to step 2.

Step 2 (create a suitable typeclass instance). TO DO.

The overall affordance offered by the typeclass feature is that we can add new members to various types (the receiver type of a typeclass method can be any type, including any type that contains one or more type variables declared by the typeclass), and the provision of instances allow us to have different implementations for different combinations of actual type arguments to the typeclass. Moreover, call sites can abstract over typeclass instances: for instance, the body of doInsert uses typeclass methods, but it is not statically known at the call site which typeclass instance it is using.

So we get to extend sets of classes with additional members (extensibility, similarly to extension methods or views) and we get to run the actual implementation polymorphically (using object-oriented dispatch, similarly to regular instance members), and this enhances the expressive power of Dart when it comes to multi-type collaborations.

So how does this work?

The semantics of typeclasses and their instances can be illustrated quite well by a desugaring step (which is a viable implementation technique as well):

During analysis/compilation of any given member access, e.g., r.m(a), typeclass methods are available in way that resembles extension methods: If the static type of r does not have a member named m, we search for applicable extension methods and for applicable typeclass methods. A compile-time error occurs if there is no extension method, but there are typeclass methods, and none is most specific. Otherwise, the most specific typeclass method is selected, and the invocation is modified such that the given typeclass instance tci is used as the receiver and the syntactic receiver is passed as the first argument, tci.m(r,a). If m is an operator, the corresponding method name is used.

For the example, we get the following desugaring:

// typeclass Insert<X, Y> {
//   Sink<X> Sink<X>.operator <<(Y y);
// }
abstract class Insert$Typeclass<X, Y> {
  const Insert$Typeclass();
  Sink<X> operator$ShiftLeft(Sink<X> sx, Y y);
}

// instance<X> Insert<X, X> {
//   Sink<X> Sink<X>.operator <<(X x) => this..add(x);
// }
class Insert$XX<X> implements Insert$Typeclass<X, X> {
  const Insert$XX();
  Sink<X> operator$ShiftLeft(Sink<X> self, X x) => self..add(x);
}

// instance Insert<String, double> {
//   Sink<String> Sink<String>.operator <<(double d) =>
//       this..add(d.toStringAsFixed(4));
// }
class Insert$StringDouble implements Insert$Typeclass<String, double> {
  const Insert$StringDouble();
  Sink<String> operator$ShiftLeft(Sink<String> self, double d) =>
      self..add(d.toStringAsFixed(4));
}

// instance Insert<double, String> {
//   Sink<double> Sink<double>.operator <<(String s) =>
//       this..add(double.tryParse(s) ?? double.nan);
// }
class Insert$DoubleString implements Insert$Typeclass<double, String> {
  const Insert$DoubleString();
  Sink<double> operator$ShiftLeft(Sink<double> self, String s) =>
      self..add(double.tryParse(s) ?? double.nan);
}

// instance<X, Y> Insert<X, Iterable<Y>> where Insert<X, Y> {
//   Sink<X> Sink<X>.operator <<(Iterable<Y> ys) {
//     for (var y in ys) {
//       this << y;
//     }
//     return this;
//   }
// }
class Insert$XIterableY<X, Y> implements Insert$Typeclass<X, Iterable<Y>> {
  final Insert$Typeclass<X, Y> insert$Typeclass;
  const Insert$XIterableY(this.insert$Typeclass);
  Sink<X> operator$ShiftLeft(Sink<X> self, Iterable<Y> ys) {
    for (var y in ys) {
      insert$Typeclass.operator$ShiftLeft(self, y);
    }
    return self;
  }
}

// void doInsert<X, Y, Z>(Sink<X> sx, Y y, Z z, Iterable<Y> ys)
//     where Insert<X, Y>, Insert<X, Z> {
//   sx << y << z << ys;
// }
void doInsert<X, Y, Z>(
  Insert$Typeclass<X, Y> insert$XY,
  Insert$Typeclass<X, Z> insert$XZ,
  Sink<X> sx,
  Y y,
  Z z,
  Iterable<Y> ys,
) {
  Insert$XIterableY<X, Y>(insert$XY).operator$ShiftLeft(
    insert$XZ.operator$ShiftLeft(
      insert$XY.operator$ShiftLeft(sx, y),
      z,
    ),
    ys,
  );
}

void main() {
  var ss = SinkString();
  var sd = SinkDouble();

  // ss << 'Hello, world' << 2.0 << ['Here', 'we come!'];
  const Insert$XIterableY<String, String>(
    const Insert$XX<String>(),
  ).operator$ShiftLeft(
    const Insert$StringDouble().operator$ShiftLeft(
      const Insert$XX<String>().operator$ShiftLeft(
        ss,
        'Hello, world',
      ),
      2.0,
    ),
    ['Here', 'we come!'],
  );

  // doInsert(sd, 'Hello, world', 2.0, ['Here', 'we come!']);
  doInsert(
    const Insert$DoubleString(),
    const Insert$XX<double>(),
    sd,
    'Hello, world',
    2.0,
    ['Here', 'we come!'],
  );
}

Discussion

It would probably be useful to have an explicit form for the invocation of a typeclass method. With typeclass A we could then express r.m(a) as A(r).m(a), which would specify the requirement that we find an applicable instance of the typeclass A. Given that the typeclass name (with a prefix, if needed) is resolved statically, this construct should not give rise to added complexity.

It is even possible to require that an invocation that gives rise to the creation of a new typeclass instance must use the explicit invocation. One benefit obtained by having this rule would be that typeclass members are only invoked (1) when an enclosing scope has a where clause, in which case we can use the associated typeclass instance; or (2) when there is an explicit typeclass member invocation. The obvious drawback would be that the code is more verbose.

A typeclass instance My$Instance<T1, T2> is essentially a container that holds several functions (corresponding to the declared members of the typeclass), and the syntactic receiver type will become the type of the first positional parameter of the desugared method. This means that dynamically checked covariance will occur not just frequently but actually for every single typeclass member. For that reason it would probably be useful to make every typeclass type parameter invariant, even in the case where sound declaration-site variance is not yet supported in general by the language.

Typeclass member invocations are rewritten at the call site, and hence there is essentially no way those invocations could ever be performed dynamically. This is not considered to be a problem. For instance, extension methods have the same property.

As mentioned, this feature shares some properties with extension methods, but it also introduces additional expressive power: Extension methods cannot offer multiple possible implementations for multiple combinations of type variables. Also, extension declarations are parametric in their type parameters: Any value of any given type parameter is supported, as long as the actual type arguments satisfy the declared bounds. In contrast, a typeclass only supports any given actual type argument list in the case where a typeclass instance with a suitable type is available (based on a where clause of some enclosing declaration), or some instance declaration can be used to obtain an instance with the required type arguments (possible recursively, if the instance has a where clause as well). These two mechanisms are so different that it seems unlikely that we would want to unify them.

The typeclass feature has a similar overlap with views: View members are always resolved statically, and hence they differ from typeclass members in much the same way as extension methods differ from typeclass members.

Views support a zero-cost property in the sense that they allow an object o to be treated almost as if it had a different type T, without wrapping o in a new object that implements T. Similarly, extension methods are resolved statically, which is zero-cost in the same sense.

Typeclass invocations could be resolved statically in some cases (like ss << .. << .. in main), but in general (including the invocations in doInsert), a typeclass member invocation is an invocation of a method on an object (namely the object that represents the given instance of the typeclass), and this is just as expensive as any other instance member invocation; also, doInsert illustrates that the typeclass instance objects must in some cases be created based on other objects at run time, which implies that not all typeclass instance objects can be constant. In that case the typeclass invocation would involve the creation of a typeclass instance object (potentially a whole tree of such objects, according to the where dependencies), plus the instance method invocation.

Versions

April 5th 2022, 1.0: First public version. April 7th 2022, 1.1: Add a sketch of the rules about selection of typeclass instances from scope.

He-Pin commented 2 years ago

FYI: typeclass in scala3: https://docs.scala-lang.org/scala3/book/ca-type-classes.html