dart-lang / language

Design of the Dart language
Other
2.65k stars 201 forks source link

Require the static type of all expressions to be a subtype of their context; push coercions down to satisfy this. #3471

Open stereotype441 opened 10 months ago

stereotype441 commented 10 months ago

Background

Usually, an expression's context type schema describes what the static type of the expression must be (or be coercible to) in order to avoid a compile-time error. For example, in the declaration int x = f();, if the static type of f() is not either int or a type coercible to int (i.e., dynamic), then a compile-time error will occur; therefore, f() is analyzed with a context of int.

However, there are two notable exceptions:

As a result of these two exceptions, we've implemented coercions in a somewhat counter-intuitive way: instead of coercing the type of each expression as necessary to ensure that it's a subtype of its context (or reporting a compile time error if no suitable coercion exists), we've implemented some coercions so that they take effect only at the point of an assignability check. Notably, we do this for the dynamic downcast coercion, and this leads to an unfortunate footgun. Consider:

String intOrDynamic(bool b, int i, dynamic d) => b ? i : d;

This code is accepted without any static errors or warnings. The return type String is used as the context type for analyzing the conditional expression b ? i : d; this is pushed down into the two branches of the conditional, i and d. Although i has type int, which is not assignable to String, no error is reported, because there is no assignability check inside a conditional expression. By the same reasoning, analysis of d with context String does not (yet) produce a dynamic downcast. So the static type of i is int and the static type of d is dynamic. Now, the static type of the whole conditional expression is LUB(int, dynamic), which is dynamic. At this point, an assignability check is performed, to see if => b ? i : d can be used as the body of a function with return type String. Since a coercion exists from dynamic to String (namely, dynamic downcast), there is no static error.

But it would be really nice if this code had a static error. Because at runtime, it's guaranteed to throw an exception whenever b is true.

We have often talked about redefining the behavior of conditional expressions so that if either branch of the conditional expression isn't assignable to the context, a compile-time error would be issued. But if we did that, then things would get weird because of the exceptions noted above. For example, this is a valid and reasonable thing to do:

f(Object o, bool b, double d, String s) {
  if (o is int) { // promotes `o` to `int`
    ...
    if (b) {
      o = d; // demotes `o` back to `Object`
    } else {
      o = s; // demotes `o` back to `Object`
    }
  }
}

And it also seems reasonable that we could refactor the "if/else" into a conditional expression, like so:

f(Object o, bool b, double d, String ) {
  if (o is int) { // promotes `o` to `int`
    ....
    o = b ? d : s; // demotes `o` back to `Object`
  }
}

But if we changed the behavior of conditional expressions so that it was an error if either branch didn't satisfy the context, then this innocent refactor would lead to a surprising compile-time error, because the context for b ? d : s is int, and neither double nor String is assignable to int.

Similar problems occur with switch expressions.

Proposal

Change the treatment of assignments to promoted local variables, so that the context for the right hand side of the assignment is the the static (unpromoted) type of the local variable, not its promoted type.

Change the way we implement coercions so that for each expression in the program, a provisional static type is initally computed, using the rules for expression static types that we currently have, and then:

(Note that these bullet points establish the invariant that the static type of any expression is a subtype of the greatest closure if its context type schema with respect to ``.)_

Consequences

Different type inference for the right hand side of an assignment to a promoted local variable

Today, when an assignment is made to a promoted local variable, the right hand side is analyzed using the promoted type of the variable as its context. So, for example:

f(Object o) {
  if (o is List<int>) {
    ...
    o = [];
  }
}

Since o is promoted to List<int> at the time of the assignment, [] is analyzed using a context of List<int>, and therefore [] is interpreted as <int>[].

With the change, [] will be analyzed using a context of Object (the unpromoted type of o). So [] will be interpreted as <dynamic>[].

I'm not sure whether this is better or worse. But I suspect that it will be very rare for it to make a difference in real-world code.

Better handling of dynamic in conditional and switch expressions

Recall the intOrDynamic function:

String intOrDynamic(bool b, int i, dynamic d) => b ? i : d;

With the change, the context type String is still pushed down into the subexpression i. A provisional static type of int is computed for i. But now, following the bullet points above, we see that:

Therefore a compile-time error is issued. The footgun has been fixed!

Better handling of conditional and switch expressions where LUB produces too big a type.

Consider this code from https://github.com/dart-lang/language/issues/1618:

PSW foo(bool bar) {
  final PSW result = bar ? CNB() : AB();
  return result;
}

class CNB extends SW implements PSW { }

class AB extends SW implements PSW { }

class PSW implements W { }

class SW extends W { }

class W { }

Today, the conditional expression bar ? CNB() : AB() had a static type of LUB(CNB, AB), which is W, leading to a compile-time error because W is not assignable to PSW.

With the change, the type W is merely the provisional type of the conditional expression. Then, following the bullet points above:

Therefore, the static type of bar ? CNB() : AB() is R (i.e., PSW).

This fully addresses https://github.com/dart-lang/language/issues/1618.

If-null becomes slightly more restrictive

Consider the following code (which is allowed today):

List<int>? maybeList = ...;
var somethingToIterate = maybeList ?? Iterable.empty();
for (var i in somethingToIterate) {
  ...
}

With the change, this becomes a compile-time error, because the context type for Iterable.empty() is List<int>?, and Iterable is not a subtype of List.

This could definitely be a source of breakage for users, and we should investigate how often it arises in practice.

However, even if it does come up a non-trivial number of times, there are at least three easy fixes:

Coercions are pushed down into if-null expressions, conditional expressions, and switch expressions.

Consider this code:

abstract class C {
  void call();
}
void Function() f(bool b, C c, void Function() g) => b ? c : g;

Today, this produces a compile-time error, because the static type of b ? c : g is LUB(C, void Function()), which is Object, and Object is not assignable to void Function().

With this change, the coercion is pushed down to the branches of the conditional. Since the provisional static type of c is C, which is not a subtype of void Function(), c is coerced using a .call tearoff, and the static type of c is void Function(). Which means the code is accepted, and interpreted as:

void Function() f(bool b, C c, void Function() g) => b ? c.call : g;

This should fully address https://github.com/dart-lang/language/issues/3363.

Coercions are pushed down into cascade targets.

This has advantages and disadvantages.

On the advantage side, consider this code:

int f(dynamic d) => d..foo();

Today, the implicit downcast is interpreted as happening outside the cascade, i.e.:

int f(dynamic d) => (d..foo()) as int;

This code is statically accepted, because it is permissible to call any method on a dynamic type. But it's guaranteed to throw an exception at runtime, because the only possible values for d which will survive the dynamic downcast are integers, and integers don't have a foo method.

With the change, the implicit downcast is interpreted as happening inside the cascade, i.e.:

int f(dynamic d) => (d as int)..foo();

And therefore there will be a compile-time error.

On the disadvantage side, consider this code:

abstract class C {
  void call();
  void m();
}
void Function() f(C c) => c..m();

Today this is permitted, because the call to m happens before the implicit tearoff of .call. With the change, it will become a compile-time error.

If the user wants to keep the old behavior, they can always do the .call tearoff explicitly:

void Function() f(C c) => (c..m()).call;

Soundness

As promised, here's why it's sound to treat the static type of a conditional expression, switch expression, or if-null expression as R (the greatest closure of the context type schema), even if the provisional static type is not a subtype of R.

Recall that the bullet points listed in the "proposal" section establish the invariant that the static type of every expression is a subtype of the greatest closure of its context type schema with respect to _.

For conditional expressions and switch expressions, the context type schema is passed down into each branch without alteration. Therefore, thanks to the invariant, the static type of each branch is a subtype of R. Since the runtime value of a conditional expression or switch expression is always obtained from one of its branches, if we assume soundness of the subexpressions, then the runtime type of the whole conditional exprssion or switch expression must also be a subtype of R. So it does not violate soundness to assign the whole conditional expression or switch expression a static type of R.

For an if-null expression e1 ?? e2, things are slightly more complicated. First, if the if-null expression has no context, or the context of the if-null expresison is either _ or dynamic, then R is a top type, and therefore it does not violate soundness to assign the whole if-null expression a static type of R, because every runtime type is a subtype of a top type.

If, on the other hand, the context of the if-null expression is not either _ or dynamic, then:

stereotype441 commented 10 months ago

@dart-lang/language-team As discussed in today's language team meeting. Let me know what you think!

lrhn commented 10 months ago

The one thing this will not allow, that I've wanted, is an aspirational context for the expression of an as cast.

It would be nice if 1 as double didn't throw, because it knows that you want a double. That can only work if we have aspirational contexts. (But it might just be trying to mash two different functionalities into one operator. The "completely unlimited unrelated cast" that it does today, and a more static kind of "I want this type" signal.)

Otherwise I think it's a good simplification.

Both promoted variable assignment and ?? changes are likely to be breaking.

On the other hand, using the unpromoted variable type should get rid of intersection types as context types, which is an added bonus. The difficulty is that the breakage of that won't be at the assignment itself, that'll likely still work, it'll just demote the variable and cause a later failure. That makes it harder to find and fix automatically.

The ?? issue is real, but I expect most occurrences of someList ?? Iterable.empty() to have a context type, likely occurring directly on a for/in with a context type of Iterable<_>. (Which means that it becomes Iterable<dynamic> today, and nobody notices the implicit dynamic?)

I think we should try to implement this, to see how much it breaks.

eernstg commented 10 months ago

This is amazing! I'd certainly support trying out how breaking this is, and doing it even in the case where there is some breakage.

It's a highly desirable property of this proposal that it simplifies so many things, and that it includes #1618 and #3363.

One thing came to mind, in bullet 2 of the second rule of the proposal:

Let R be the greatest closure of the context type schema with respect to _. Or, if the expression has no type schema, let R be dynamic.

IIUC then it makes no difference at all that we use dynamic here rather than any other top type: The next bullet says that "if T <: R" then the type of the expression is T, and that's always true for any top type including dynamic. So there's no need to worry about having a larger number of expressions of type dynamic if we adopt these rules. On the other hand, it seems likely that we could just as well say "..., let R be Object?", just to practice a good habit. ;-)

(OK, we'd probably need to say "_ or Object?" near the end in the section about ?? as well.)

The main reason to hesitate would be that this proposal will cause promoted variables to be demoted more often. You might even say that a promoted variable appears to wish to return to its unpromoted status because of the preference given to context types during type inference. That sounds like changing the user experience from the current level, and moving it down a notch. That could be a source of serious disappointment.

It would be nice if we could rely on the following migration scenario: We introduce the new rules, some promoted variables get demoted, and then there are errors about those variables later on, and developers will undo all the changes that matter because they'll change things like v = [] to v = <T>[] to avoid the demotion. This is a good scenario because the breakage occurs at compile time.

However, it's perfectly possible that there are no errors when v is used later on, if v is a List<dynamic> after the change. So we can't really hope for compile-time-only breakage.

Maybe we should link this change (at least in terms of migration advice) with some 'strict-...' options, to catch locations where the type dynamic suddenly pops up? Perhaps it would be useful to have a lint that flags variable demotions?

(As usual, it might be useful to have differential analysis, flagging locations where we have variable demotions that didn't occur with the pre-change rules, but that's probably much more work.)

stereotype441 commented 9 months ago

I'm starting work on this. I plan to do a prototype first, then see how much breakage it causes inside google3. Once I have that information we can re-assess whether we want to change course.

leafpetersen commented 8 months ago

I filed an issue here to explore a corner of this that I believe is not covered above.