dart-lang / language

Design of the Dart language
Other
2.61k stars 200 forks source link

Class instance extension members #3240

Open eernstg opened 11 months ago

eernstg commented 11 months ago

Kotlin has the ability to declare extension members as instance members of a class, as shown here.

This implies that the member can be invoked using an instance of the declared receiver type as the syntactic receiver (in the example below that's a String), and the context is used to uniquely determine the "current" instance of the enclosing type declaration (in the example below that's the this of the extension type _Base).

So we'd have a this for the extension receiver and another this for the instance member receiver. Obviously, we'd need to have a way to make the distinction. Kotlin uses this@OtherClass.someMethod() to denote the this of OtherClass and invoke its instance method named someMethod. This idea combines well with a rule that says "for m(), invoke this@ClassA.m() if class A has an m, invoke this@ClassB.m() if class B has an m, raise an error if both or none of them have it). However, that might be somewhat difficult to reason about when reading the code.

One way to make both this-es available would be to say that we don't get an implicit this for the extension receiver, we just give it a name. We must then use that name explicitly. In the example below I've used the strawman syntax (T self).name(parameterList) {...} to declare a method which is an extension member on T, using the name self to denote the syntactic receiver. The actual reserved word this has the same meaning as always in the member declaration, and we can use it implicitly.

Here is an example where we declare an extension instance member in an extension type, namely the operator ~ on a String named s:

import 'package:http/http.dart' as http;
import 'dart:convert';

typedef JsonMap = Map<String, dynamic>;

extension type _Base(JsonMap json) {
  (String s).operator ~() => json[s]!;
}

extension type PkgInfo(JsonMap json) implements _Base {
  String get name => ~'name';
  PkgVersion get latest => PkgVersion(~'latest');
  String get version => ~'version';
}

extension type PkgVersion(JsonMap json) implements _Base {
  String get archiveUrl => ~'archive_url';
  PkgPubspec get pubspec => PkgPubspec(~'pubspec');
}

extension type PkgPubspec(JsonMap json) implements _Base {
  String get version => ~'version';
  String get name => ~'name';
}

main() async {
  const pubUrl = "https://pub.dartlang.org/api/packages/protobuf";
  var response = await http.get(Uri.parse(Uri.encodeFull(pubUrl)));
  if (response.statusCode == 200) {
    PkgInfo info = PkgInfo(json.decode(response.body));
    print('Package ${info.name}, v ${info.latest.pubspec.version}');
  } else {
    throw Exception('Failed to load package info');
  }
}

The point is that this kind of member allows us to use the syntactic slot reserved for a receiver with some other object (in this case with a string), and we still get to operate on the this of the enclosing declaration. So we just write ~'Hello!', and this allows us to do things that would otherwise be expressed as someMethod('Hello!') (which is again a short form meaning this.someMethod('Hello!')).

The weird part is that these declarations can only be used in a context where there is a suitable value for this. For example, we could write r.someMethod('Hello!') where r is an arbitrary expression whose type has a someMethod, but we can't put that r anywhere if we want to get the same effect as we get with ~'Hello!' inside the class, because we have no way to say that "by the way, during the execution of that operator ~, this should be bound to the value of r."

In short, class instance extension members can be really concise and convenient, but they are effectively instance-protected, in the sense that they can only be invoked in the body of the class / mixin / extension type that declares or inherits the declaration, and they can only be executed such that this is bound to the same object as in the caller.

I think they should be statically resolved, because they are likely to be so similar to extension methods that they will have a run-time representation where the "extension-this" object is passed as an argument. It would be an anomaly (e.g., it wouldn't work with dynamic invocations) if the class instance extension member has a different signature at run-time than it has at compile time.

If anyone really needs OO dispatch and overriding then they'd just write a forwarder (which is basically the same thing as the expected desugaring of the class instance extension member).

So we wouldn't support this:

abstract class A {
  int (String s).operator ~();
  int m(String s) => ~s;           // <-- Used here!
}

class B extends A {
  int count = 0;
  int (String s).operator ~() => count += s.length; // Override not supported!
}

.. because that could just as well be written as follows:

abstract class A {
  int (String self).operator ~() => operatorTilde(self);
  int operatorTilde(String s);
  int m(String s) => ~s;
}

class B extends A {
  int count = 0;
  int operatorTilde(String s) => count += s.length;
}

[Edit: Changed syntax to put the syntactic receiver declaration before the built-in identifier operator.]

lrhn commented 11 months ago

I think it's a very weird feature. You have to have an instance of type A in order to call a method on type B. It's like inner classes, but for extensions.

My first approach would be to just declare an extension inside the outer class:

class C {
  extension E on B {
    void foo() {}
  }
}

Then anybody can do

myC.E(myB).foo();

To do an explicit extension invocation, and if you're in the scope of an instance member of C, then that C's E can be called implicitly.

Writing an entire extension declaration for just one method may be overkill.

I want to change the syntax for extension declarations to

extension Foo(OnType id) {…}

so what if we allow a syntax for one-off extension members, say:

void extension<T>(OnType id).foo() {}

(Maybe even drop the extension.) Then you can declare that inside another class too.

Or use the C# syntax:

void foo(this OnType id) {}

treating the receiver as a special parameter. Might be too close to initializing formals.

(Feels like we're skipping past lot of things to get to inner declarations, say just static nested declarations, before moving to inner ones)

eernstg commented 11 months ago

I think it's a very weird feature.

Somewhat weird indeed. ;-)

However, the considerations you mention do not include the part that I consider to be the very point of this mechanism: It allows us to use a compact syntactic form for invocations of code that operates on two objects: The syntactic receiver of the expression itself, and the this of the enclosing class.

It's this ability to involve two objects for the price of one that allows us to use expressions like ~'Hi' to do something with the syntactic receiver (the string 'Hi') in collaboration with the current this.

This brevity is at the core of a bunch of situations where Kotlin code is able to specify operations on this in a very concise form. This is often used to express a structure-building operation in a declarative way, that is, such that the structure of the code is similar to the structure of the resulting object graph. I'm just thinking "maybe that could be useful in Dart, too".

Wdestroier commented 11 months ago

I was waiting for this feature in Dart, thanks for creating the proposal.

int operator (String s).~(); looks so awkward, to me operator and ~ should appear one next to the other, such as operator~ / int String.operator~();.

eernstg commented 11 months ago

Thanks! I changed the syntax in the original posting to put the syntactic receiver before the word operator.

We'd still need a name for the syntactic receiver, e.g.: int (String s).operator ~() ... (with a space before the operator symbol ~ because dart format puts a space there).

We could also eliminate the specification of this name and use specialized syntax to denote the syntactic receiver (like this@String or whatever will work in the grammar), or we could introduce a new name (that), but I'd expect the rest of the language team to be unhappy about "magic" names (I'm not too happy about them, either ;-).

Declaring a new name explicitly is a safer bet. For instance, we won't ever have to worry about name clashes if you could just choose a different name.

@lrhn mentioned the C# syntax:

class A {
  void foo(this OnType id) {}
}

which would yield operator declarations like this: int operator ~(this String s) => ...;. This approach has been used for a long time, so we should be able to learn about any wrinkles as well as some reasonable ways to handle them.

However, it might be considered confusing that the parameter is declared using this String s, whereas this in the body denotes the current instance of the enclosing class/mixin/extension-type/... declaration, and the syntactic receiver is still denoted by the declared name s. Also, we'd need to introduce a parameter for a getter, which may or may not create ambiguity issues in the grammar: int get g(this String s) => s.length + this.something;. Similarly for setters with two parameters (but that should be easy to parse).

In any case, one nice syntactic property of the int (String s).operator ~() ... and int (String s).m(bool b) ... style is that it puts the syntactic receiver in the position in the declaration where it would also go in the invocation: 'Hi'.m(true).

When it comes to tear-offs, we'd presumably capture "both this-es":

class A {
  int x;
  A(this.x);
  int (String s).m(int y) => s.length + x + y;
  int Function(int) tearItOff(String s) => s.m;
}

void main() {
  var a = A(10);
  var f = a.tearItOff('x');
  print(f(100)); // '111'.
}
eernstg commented 11 months ago

One unusual property of class instance extension members is that they can only be used in a location with access to this. The typical example of a location with access to this is the body of an instance member declaration, and the quick test which can always be performed is to put the expression this or the statement this; in some location, and see if it is a compile-time error or not.

Here are a couple of examples using locations outside body of the class itself. First, we can use an extension:

class A {
  final String s;
  String get (String self).g => '$self, $s';
}

extension on A {
  void foo() => print('Hello'.g);
}

void main() {
  A('world!').foo(); // 'Hello, world!'.
}

Another example is anonymous methods (if they are adopted). First consider this example where we're building a tree using the current language:

class Tree {
  final String value;
  List<Tree> children = [];
  Tree(this.value);
  void add(Tree child) => children.add(child);
  String toString() =>
      '$value(${[for (var c in children) c.toString()].join(', ')})';
}

Tree build({required bool third}) {
  var n1 = Tree('n1');
  var n11 = Tree('n11');
  n1.add(n11);
  var n12 = Tree('n12');
  n1.add(n12);
  var n121 = Tree('n121');
  n12.add(n121);
  var n122 = Tree('n122');
  n12.add(n122);
  if (third) {
    var n13 = Tree('n13');
    n1.add(n13);
  }
  return n1;
}

void main() {
  print(build(third: true)); // 'n1(n11(), n12(n121(), n122()), n13())'.
}

A basic example of an anonymous method is the following:

void main() {
  'Hillo'.{ print('${substring(0, 2)}, world!'); }; // Prints 'Hi, world!'.
}

The point is that the .{ /*code*/ } construct is executed like a function literal, but in the body there is access to this (explicitly and implicitly), and the value of this is the receiver of the anonymous method invocation (here: the string literal 'Hillo'). In other words, the { /*code*/ } is very similar to the body of an instance method.

Here is how we could build the tree again using anonymous methods (again using ~ where Kotlin uses unary +, because Dart doesn't have unary +):

class Tree {
  final String value;
  final List<Tree> children = [];

  Tree(this.value);

  void (Tree t).operator ~() => children.add(t);
  String toString() =>
      '$value(${[for (var c in children) c.toString()].join(', ')})';
}

Tree build({required bool third}) {
  return Tree('n1').{
    ~Tree('n11');
    ~Tree('n12').{
      ~Tree('n121');
      ~Tree('n122');
    };
    if (third) ~Tree('n13');
  };
}

void main() {
  print(build(third: true)); // 'n1(n11(), n12(n121(), n122()), n13())'.
}

The resulting code is rather declarative, in the sense that the shape of the code is similar to the shape of the tree which is being built.

The code could be even more declarative if we had built the tree using a large expression of nested constructor invocations (Flutter style), whereas the code in this example is actually imperative.

However, there may be reasons why the single expression cannot be used. For instance, we might want to have or not have a particular element in the structure, and the expression language isn't sufficiently expressive to handle that. We can indeed use collection elements ([c1, c2, if (b) c3, for (var c in cs) c.parent, ...cs]) to build a list of children; but we wouldn't be able to include or exclude a single constructor element if it's a separate parameter of a constructor. We might also want to use function invocations (possibly recursive) in order to build a complex structure, and collection elements can't express recursion either.

So we do have a very nice declarative style already, and it might suffice, but an imperative style that looks so declarative will open some extra doors that we might need.