dart-lang / language

Design of the Dart language
Other
2.66k stars 204 forks source link

Case functions #148

Open eernstg opened 5 years ago

eernstg commented 5 years ago

145 states that there is a need to handle invocations with one-of-several types of parameters, such that a union type would match the needs pretty well, but no subtyping relation will specify just the acceptable types (example taken from #145):

void writeLogs(Object stringOrListOfString) {
  if (stringOrListOfString is String) {
    _writeLog(stringOrListOfString);
  } else if (stringOrListOfString is List<String>) {
    stringOrListOfString.forEach(_writeLog);
  } else {
    throw ArgumentError.value(stringOrListOfString, 'Not a String or List<String>');
  }
}

Several problems with this approach are discussed in #145, but lack of type safety is an obvious one.

We have considered a notion of case functions which could be used to handle such situations. Here is an example corresponding to the above:

void writeLogs(Object stringOrListOfString) {
  case(String s) => _writeLog(s);
  case(List<String> l) => l.forEach(_writeLog);
  default: throw ArgumentError.value(stringOrListOfString, 'Not a String or List<String>');
}

The point is that (1) this is a regular function, which means that it can be used in all the ways you'd expect for a function. The execution of the body visits the cases in declaration order and runs the first piece of code whose constraints are satisfied.

But, (2), it is also a set of specialized functions that static analysis can recognize, and if there is a call site where the statically known type of the argument list satisfies a case, then the static analysis will choose the first such case for the given call site, and code generation will generate an invocation of just that case (it's implicitly available as a separate function as well).

Each case may specify a return type, just like normal functions (and they may use async etc., if it is compatible with the signature of the enclosing function):

num foo(num n) {
  int case(int i) { return 2*i; } // Of course, the body can be a block.
  double case(double d) => d + 1;
}

main() {
  int i = foo(42); // No downcast.
  double d = foo(1.0); // No downcast.
  int j = foo(2.0); // Error.
}

If there is no default then statically checked invocations that do not match any case will be rejected (could be a warning or an error, whatever gets more support), and dynamic invocations which don't match a case will throw (ArgumentError is probably fine).

For a tear-off, a context type which is a supertype of a case would tear off that case. A context type which is a supertype of the signature of the whole function would tear off the whole function (again: with no default, and in a typed setting, that tear-off would be an error).

With sealed classes, there is no need to have a default because it is possible to write a complete list of cases, and this list can be checked for exhaustiveness (it's probably a bug to have an incomplete list), but it's of course possible to use non-leaf types in a sealed hierarchy in order to cover all possible types without listing every concrete type.

// Library L1.

// No subtypes of `A` can be declared outside L1, or its package, or whatever. ;-)
sealed class A { num get bar => 2.1; }
class B implements A { int get bar => 22; int baz(int i) => i; }
class C implements B { int get bar => 23; }

// Library L2.
import L1;

num getBar(A a) {
  int case(B b) => b.baz(b.bar);
  num case(A a) => a.bar;
  // No default needed because all of {A, B, C} are handled already.
}

An obvious complaint would be that it is error-prone to rely on the textual order of the cases (so we couldn't swap the case for A and B above, because then the case for B would be dead code), because it introduces the possibility that static analysis will choose a specific case, but then at run time the actual object will be some subtype that implements several types (including some cases which are earlier in the list).

But I think that the safety belts already in place will make that problem manageable. For instance, with a sealed type hierarchy it's easy to enforce that the dynamic and the static behavior are precisely identical; and with near-bottom types like int and String, there cannot be any subtypes so we get the same guarantee for free, and the upcoming value types are likely to offer similar guarantees.

It would probably be rather easy to extend case functions with matching on constants (aka literal types):

void baz(num n) {
  case(42) => print("Woohoo!");
  case(41) => print("Almost there!");
  case(int i) => print("OK, $i will do.");
  // etc.
}

With call sites where the literal type can be resolved, this could give rise to extensive inlining and other opportunities for optimization.

kevmoo commented 5 years ago

Would such functions be entirely case statements? Could I put in "normal" code before? Interleaved between?

If not, I almost want a function modifier (like async) to really call this out.

void baz(num n) switch {
  case(42) => print("Woohoo!");
  case(41) => print("Almost there!");
  case(int i) => print("OK, $i will do.");
  // etc.
}
natebosch commented 5 years ago

How does this scale when there are multiple parameters?

eernstg commented 5 years ago

@kevmoo wrote:

Would such functions be entirely case statements?

Yes, or at least: I don't think it's going to be easy to come up with useful rules for allowing code outside the nested case declarations.

almost want a function modifier

That would certainly make sense: When a modifier like async is added to the body of a function the rules for the contents of the body are different (await is now a keyword), and a similar but even more radical rule applies to a switch body. So that would probably be good for the readability of the code.

@natebosch wrote:

How does this scale when there are multiple parameters?

Let's call the function as a whole the 'main' declaration and each case declaration and default: a 'case' declaration (default: uses the argument list of the main declaration).

The signature of the main declaration determines which invocations are allowed for that function, and the signature of each case declaration must be such that some of these invocations can be handled, but not all of them (because we want to select some invocations for each case, and if one case accepts all of them then the remaining cases are dead code).

So we must be able to connect each case parameter to a corresponding parameter of the function as a whole.

Here are some more details on this, in case you wish to go deeper: We may be able to allow a case declaration to omit some parameters which are optional in the main signature, and we may be able to allow a case declaration to declare some parameters as required which are optional positional in the main signature. But we cannot detect at run time whether a given actual argument is received because it was provided as a default value, or it was explicitly passed with that value. So I'd prefer to require that the main declaration and each case has the same argument list shape. We could allow a case to omit some optional parameters, and let that be syntactic sugar for declaring those optional parameters the same way they are declared in the main signature, but preventing their use in the body of that case. We could also flag invocations that statically resolve to such a case "with missing optionals" if any such optionals are passed, and we could give a statically resolved tear-off of that case a static type which does not include those optionals. But for a dynamic invocation we cannot expect to be able to avoid selecting a specific case because some of those optionals are passed. We might be able to specify a "complement literal type" (containing "all values except this one"), but I haven't looked into any details of these ideas.

The short version is that each case declaration must essentially have the same argument list shape as the main declaration.

The type annotation on a parameter of a case must be a subtype of the one on the corresponding main parameter (and if the type annotation is omitted then it is taken to be the same as on the main parameter).

In summary, the function type of a case declaration needs to be a supertype of the function type of the main declaration, but any differences in the argument list shape will be taken into account during static analysis only.

lrhn commented 5 years ago

The matching here is entirely on types, not on values like a normal switch. That measn that we would then have a type case feature in the language. Instead of forcing users to write a function literal to get that behavior, it should also work as a statement:

var value1 = getSomething(), value2 = getSomethingElse();
switch (value1, value2) {
   case(String x, String y): { return x + y; }
   case(int x, int y): { return x + y; }
   case(_, Null _): 
   case(Null _, _): { return null; }
   default: { return "$value1+$value2"; }
}

(the syntax here is not sufficient to distinguish it from a normal switch, we need something more).

eernstg commented 5 years ago

I think the case function differs from a type case in that we would want to ensure that it is "visible from the outside".

So we don't just get a function with a body that is capable of making a choice based on some dynamic types, we actually get a set of separate functions (one for each case), and tools will be allowed and required to select a case statically if certain requirements are satisfied. A new "type switch" statement wouldn't give us this capability.

lrhn commented 5 years ago

I agree that the switch-function is not just a function with a top-level type case, but I also don't want people writing functions just to get the type-case functionality. The functionality should also be available as a statement.

eernstg commented 5 years ago

type-case .. should also be available

Sure! Wanted that for a long time as well. Of course, we also want type-case methods, but they require some extra rules about overriding. ;-)

Zhuinden commented 5 years ago

Sounds eerily like the

fun baz(n: num) = when(n) {
    42 -> println("Woohoo!")
    else -> println("OK, $n will do")
}

from Kotlin.

Based on what I know about Dart (which is not much), could the following make sense?

int baz(num n) => switch(n) {
  case(42) => 21
  case(41) => 10
  else => 0
}

?

munificent commented 5 years ago

Very interesting!

I really love the idea of runtime-dispatched overloading, and exposed in a way that's visible in the static signature of the API. I'm not super thrilled by the syntax proposed here, but that's not a big concern. I'm sure we can come up with something we all like. A few questions/thoughts:

One nice feature is that this gives the language a way to express how + on numbers actually works without having to special case the arithmetic operators in the language spec.

it introduces the possibility that static analysis will choose a specific case, but then at run time the actual object will be some subtype that implements several types (including some cases which are earlier in the list).

Yeah, this is a little weird. It's arguably no weirder than what happens in Java/C# when you have overloads that take parameters of related types. Users are very often surprised that they get static and not runtime dispatch on the type of the arguments.

How does this work with overriding? I assume if I override a case method, I completely shadow all of the base cases. If I want to only override some, I would just have those cases in my override and then the default would call super()? Would that do the right thing?

Is it possible to define overloads with different arities or disjoint parameter types? Does the latter always force you to use object or dynamic? If so, what would that mean if we consider moving value types out of the object hierarchy?

eernstg commented 5 years ago

How does this work with overriding?

We would have different concepts of case functions and case instance methods, and I didn't go into the latter (so there was no overriding, and hence no need for rules). We can start to flesh it out:

The fact that static analysis can "look into" a case function almost certainly implies that the method signature would have to include the complete set of cases for an instance method, probably including the ordering. This means that we would specify correct overriding in terms of that set of cases, and in particular it should be possible for a call site to select a specific case and rely on the run-time type of the receiver to have an implementation of that case (which would then have to accept the given arguments and return a result of the expected type, which means that the function type of the actual case must be a subtype of the function type of the statically known type).

I believe that it is never a problem to allow regular (non-case) method to be overridden by a case method: Invocations of a case method as a non-case method is supported anyway (e.g., that's what we do for dynamic invocations, and for a tear-off where the target type does not justify committing to any particular case).

One approach that seems promising for case-case overriding is to keep the signature of each case fixed:

We would then say that a declaration D of a case method which overrides method signature S from a superinterface must declare cases with exactly the same function types as S, in the same order (such that the potential differences between static and dynamic selection would be the same for the actual receiver type as it is for the statically known receiver type). It would then be an error if multiple superinterfaces have a method signature with the same name, but they disagree about the set of function types of cases, or about their ordering.

A case method can be overridden by a non-case method, which implicitly means that it will have the overridden set of cases, all with the given non-case method body. Those cases would then be part of the method signature, and if it's overridden again by another case method then that declaration would have to be consistent with the given cases (so it wouldn't start from scratch, even though it seems to override a non-case method, because static analysis "knows" that it is a case method, albeit implicitly).

You could argue that it would be tempting to allow many other situations, but we would have to consider them carefully:

A method declaration D could be a correct override of several method signatures S1 .. Sk when it has a list of cases L which is a superlist of the list of cases of each Sj (so it's a superset if we ignore the order; and it preserves the order for each overridden method signature, in the sense that we can obtain the list of cases for Sj by deleting some cases from L).

The reason why we'd want to be careful before we commit to this kind of rule is that this could make it more difficult to understand the relationship between static case selection and dynamic case selection (concretely, the relationship between the meaning of e.foo(42) when the static type of e is A and when it is C where C implements A).

abstract class A {
  num foo(num n) {
    int case(int i);
    default;
  }
}

class B {
  num foo(num n) {
    double case(double d);
    default;
  }
}

class C implements A, B {
  num foo(num n) {
    int case(int i) => 2*i;
    double case(double d) => d + 1;
    default;
  }
}

One issue is that the default case can be omitted in C. But it may be a non-trivial exercise to detect the cases where this is known, and allow an overriding case method to omit the default when its case list is exhaustive (as a special exception to the rule saying "overrider must have all cases that it overrides").

Another issue is that C could just as well declare its non-default cases in the opposite order. I'm pretty sure that this freedom to reorder the cases will give rise to subtleties (say, if different implementations of a certain set of superinterfaces order the same set of cases differently), especially when we start considering longer argument lists (say, ordering num, int before or after double, num).

Alternatively, we could insist that a case-case override declares exactly the same number of cases in the same order, but allow each case to use different types. As mentioned, we would then have to require that a case declaration cj,1 which overrides a case declaration cj,0 is such that the function type of the former is a subtype of the function type of the latter (because this ensures that an invocation of the former at run-time is safe, if it is safe according to static analysis based on the latter).

This, however, seems to disrupt the selection even more, so we would again need to put constraints on the allowed overridings in order to keep programs comprehensible:

abstract class A {
  num foo(num n1, num n2) {
    int case(int i1, int i2);
    double case(double d1, double d2);
    default;
  }
}

class C implements A {
  num foo(num n1, num n2) {
    int case(num n, int i) => 42;
    double case(double d, num n) => d + 1;
    default: return n1 + n2;
  }
}

We do have the required subtype relationship: int Function(num, int) is a subtype of int Function(int, int), etc, but an invocation with receiver type C could choose the first case statically (as well as dynamically) in situations where the receiver type A would not choose the first case, and this means that a developer cannot rely on having the cases paired up.

I suspect that variations of the parameter list shape would have the same properties: We would have a simple and strict approach which is obviously comprehensible for the developer who declares an overriding method or calls it, and then we'd have a lot of extensions from there which would introduce a lot of subtleties in return for some amount of flexibility.

In summary, I think we can do lots of fancy things here, but we should get started with a rather restricted approach (like "same cases, period!"), and then we can explore the more permissive/expressive approaches, possibly later on.

In general, I'd really like to give developers the ability to enforce that static and dynamic dispatch will give exactly the same results. This would be rather easy to achieve with sealed types (and, presumably, value types), and I think we should consider other parts of the design of this feature with that goal in mind. This would allow developers to use a strict style of coding where this source of

weird

ness is simply absent.

With respect to

disjoint parameter types

That is no problem! There is nothing special about having a number of cases which are mutually exclusive (e.g., int in one case and double in another). We currently only have very few types in Dart which are disjoint, because we can almost always write a new class class C implements A, B .. in order to make A and B non-disjoint (we have exceptions like "A has a foo getter and B has a foo method", etc), but I expect to have many more of these disjointness relationships with sealed classes and value classes.

Ing-Brayan-Martinez commented 5 years ago

Me parece muí interesante incorporar esta característica a este lenguaje me gusta mas usando la sentencia switch al estilo de java de esta manera:

int baz(num n) => switch(n) {
  case(42) => 21
  case(41) => 10
  default => 0
}

Esto me hace recordar las switch expression incorporada en java 12 esto seria un gran avance para el lenguaje Dart mas informacion aqui switch expression java 12

shinayser commented 4 years ago

Isn't the when block of Kotlin language much more powerfull and flexible than the proposal on the top post?

Like, today we have to do something like that:

switch(number) {
    case 0:
        System.out.println("Invalid number");
        break;
    case 1:
        System.out.println("Number too low");
        break;
    case 2:
        System.out.println("Number too low");
        break;
    case 3:
        System.out.println("Number correct");
        break;
    case 4:
        System.out.println("Number too high, but acceptable");
        break;
    default:
        System.out.println("Number too high");
        break;
}

Then 'when' is like:

when(number) {
    0 -> println("Invalid number")
    1, 2 -> println("Number too low")
    3 -> println("Number correct")
    4 -> println("Number too high, but acceptable")
    else -> println("Number too high")

And you can also return the value directly:


fun baz(num n) => when(n) {
    0 -> "Invalid number"
    1, 2 -> "Number too low"
    3 -> "Number correct"
    4 -> "Number too high, but acceptable"
    else -> "Number too high"
}

In dart we could have something similar, just like @Ing-Brayan-Martinez pointed on the post above mine.

eernstg commented 4 years ago

@shinayser wrote (and I forgot to respond, sorry about that ;-):

Isn't the when block of Kotlin language much more powerfull and flexible

Those two language constructs are not comparable. The when block of Kotlin is an expression which may occur anywhere an expression is required, and it will not affect the type of an invocation of the enclosing function.

The proposal in this issue is aimed at allowing one function declaration to have several distinct types, covered by a common supertype (which is the signature of the whole function declaration). The function type implied by each case is available to callers, and when a specific case has been selected it is determined at compile time that we are calling a function of that type, and it will indeed "call that case" at run time.

So we may get a better static analysis at each call site, and we may get better performance because each case may allow special optimizations, and we still maintain that there is a function which represents all cases collectively (such that even a tear-off with context type dynamic can be performed, and so on).

ds84182 commented 4 years ago

Question, how does this work with type parameterization?

e.g.

Object foo<A, B>(Object value) {
  A case(A) {
    print('A: $A');
    return value;
  }
  B case(B) {
    print('B: $B');
    return value;
  }
}

foo<int, double>(foo<int, int>(123));

Would this print A: int; A: int or B: int; A: int? Assuming the former because of how switch statements are currently handled.

Second, would you be able to use case functions to specialize on type parameters?

T default<T>() {
  case<int>() => 0
  case<double>() => 0.0
  case<String>() => ''
}

This would allow developers to promote a type parameter instead of a value.

eernstg commented 4 years ago

Question, how does this work with type parameterization?

Object foo<A, B>(Object value) {
  A case(A a) {
    print('A: $A');
    return a;
  }
  B case(B b) {
    print('B: $B');
    return b;
  }
}

void main() {
  int i = foo<int, double>(foo<int, int>(123));
}

Each case has a signature which makes that case a supertype (as a function type) of the signature of the function, that is, each case can stand in for the function for a certain subset of the actual argument lists accepted by the function. So with a generic function each case is also a generic function, with the same type parameters, with the same bounds. It would be noisy and not helpful to redeclare those type parameters in each case (they can't change, essentially, because then the function types would be unrelated).

The reason why we choose the first case even when the second case would also work (when passing <int, int>) is that this is the basic disambiguation mechanism in case functions: Static analysis will consider the list of cases in order and commit to the first one that is guaranteed to apply; if there is no default case and no case is guaranteed to apply then we have a compile-time error for that call site; and if no static case selection can be made (say, when the function is called dynamically, or it is assigned to a variable such that it has a regular function type like Object Function(Object)), the cases are checked in order based on the dynamic types of the arguments.

There's a lot of room for detailed design in the area of type inference, but let's start with the simple model where type inference is based on the function (that is, the set of cases is ignored during inference). With the given example the type arguments are provided explicitly, and that's of course covered by any approach to type inference.

The invocation of foo<int, int>(123) will then statically select the first case because it's called with an argument of type int which is a subtype of the value of A, which is again int, and the type of the expression is int. With foo<int, double>(...) we statically select the first case again, and the type of the whole expression is again int.

would you be able to use case functions to specialize on type parameters?

That's an interesting idea! In general, a relationship that exists between the function as a whole and each of the cases is that the case represents a function of type Tc and the function has type Tf, and Tf <: Tc. So every correct invocation of a case is also a correct invocation of the function (which makes sense, because the case is supposed to receive a subset of the invocations of the function).

But it is actually not necessary to maintain this relationship strictly, because there will never be a situation where such a subtype relationship exists at run time: We always "tear off a specific case" based on static information.

In particular, we can use bounds to enable a subset of invocations of a generic function, even though differences in bounds make function types unrelated (e.g., Function<X extends T1>() and Function<X extends T2>() have no subtype relationship), because it may be determined statically that the requirements for applying a specific case is satisfied.

X default<X>() {
  case<X extends int>() => 0
  case<X extends double>() => 0.0
  case<X extends String>() => ''
}

void main() {
  int i = default(); // OK, statically selects the first case.
  String d = default(); // OK, statically selects the third case.
  List<double> xs = default(); // Compile-time error.
}

I'm not quite sure how useful this kind of abstraction is, but it is an interesting fact that there is no need to insist on the subtype relationship. This is, by the way, also true for the cases where a match is made on a value (like case(1) ...).