How does implicit variable typing interact with variable type inconsistencies?

dart-lang / language

Design of the Dart language

Other

2.67k stars 205 forks source link

How does implicit variable typing interact with variable type inconsistencies? #2424

Open stereotype441 opened 2 years ago

stereotype441 commented 2 years ago

The patterns spec has this to say about logical-or patterns:

It is a compile-time error if one branch contains a variable pattern whose name or type does not exactly match a corresponding variable pattern in the other branch. and this to say about switch statements in which multiple case patterns share a body: However, it is a compile-time error if all switch case patterns that share a body do not all define the exact same variables with the exact same types.

How should we interpret these requirements under the circumstance where the type is implicit? For example, is this an error?

(int, String) x = ...;
switch (x) {
  case (y, ''): // y has implicit type `int`
  case (0, y): // y has implicit type `String`
    print(y);
}

Or how about this equivalent case using a logical-or pattern?

(int, String) x = ...;
switch (x) {
  case (y, '') | (0, y):
    print(y);
}

And what if the types match, but some of them are implicit and others are explict, e.g.:

(int, int) x = ...;
switch (x) {
  case (y, 0): // y has implicit type `int`
  case (0, int y): // y has explicit type `int`
    print(y);
}

stereotype441 commented 2 years ago

CC @munificent @lrhn @leafpetersen @kallentu @mit-mit @jakemac53 @natebosch @eernstg

FWIW, my preference is to call all of the above examples compile time errors, just to keep the interactions with flow analysis simple.

(I want to consider y and int y to be in conflict because we have subtly different promotion behaviors between explicitly and implicitly typed variables; so if we consider y and int y to match, then we'll have to explore which of those behaviors we want, and honestly I'd prefer to just call it a compile time error rather than try to figure out which behavior is more useful to the user, and implement it, and test it.)

lrhn commented 2 years ago

There are three possible options when different branches provide bindings of different types:

Compile time error. The bindings of y are incompatible, so you should fix them to be compatible.
y is not bound. There are no compatible declarations that combine to provide a y variable for the body. (The y of each binding may be used in a when clause, though.)
y is bound and has type Object (LUB of String and int). This is compatible with the values provided by all patterns.

For the case where one pattern binds an identifier, and another does not, we're down to options 1 and 2.

I think I'd be happy with any choice in practice, but as a Dart programmer I'd probably expect option 3, or option 2 if one pattern doesn't have a y identifier at all. The binding is dead code, but not wrong. Each pattern works by itself, combining them should work too, but only provide access to variables at types that are provided by all the patterns. I'd only make it an error if we cannot find a suitable LUB type. (Which is suspicious, because I usually don't trust LUB to introduce new types, and prefer users to write the type themselves. Which in this case would mean upping one of the patter's type to Object, so one of the available types is a supertype of all the types.)

eernstg commented 2 years ago

As usual, I'd prefer to emphasize code readability and comprehensibility, and in this case I think we can do that by avoiding too many "automatic" solutions to apparent inconsistencies.

This is basically @munificent's original proposal: Require all occurrences of each variable to have exactly the same type (and following @stereotype441, we'd strengthen this to say: all of them inferred, or all of them explicit), and require every pattern to bind the same set of variables.

Anything other than that is simply a compile-time error.

We could do so much more to "make it work" in many other situations, but I think it would be counterproductive in real life.

munificent commented 2 years ago

FWIW, my preference is to call all of the above examples compile time errors, just to keep the interactions with flow analysis simple.

I could be wrong, but I suspect pretty strongly that users would expect all three examples to work. That's what I would expect, and I think I would be somewhat sad if it didn't.

I can imagine cases where this does come into play in practice. For example, something like:

abstract class Widget {
  String get tooltipText;
}
class RadioButton extends Widget {
  final Widget content;
}
class RadioGroup extends Widget {
  final List<RadioButton> buttons;
}

showTooltip(Widget widget) {
  // Recurse into some widgets:
  switch (widget) {
    case RadioButton(content: inner):
    case RadioGroup(buttons: [Widget inner, ...]):
      widget = inner;
  }
}

Here, there's a type annotation on the second variable because the thing we're destructuring happens to be a subtype and we want to upcast it to match the less precise type of the other case. I think I ran into similar code in the test refactoring of dart_style to use patterns.

My mental model is that the check for whether you bind the same variables in | branches or across cases that share a body is that it's a simple compile-error reporting pass that happens after all other type inference is done.

(I want to consider y and int y to be in conflict because we have subtly different promotion behaviors between explicitly and implicitly typed variables; so if we consider y and int y to match, then we'll have to explore which of those behaviors we want, and honestly I'd prefer to just call it a compile time error rather than try to figure out which behavior is more useful to the user, and implement it, and test it.)

Ooh, that, uh, is interesting. Are those behavioral differences spelled out anywhere? This is a new one for me.

stereotype441 commented 2 years ago

There's actually just one difference: if a variable is implicitly typed, and its initializer expression is a promoted type variable (e.g. T&int), then the inferred type of the variable is just the type variable by itself (no promotion, i.e. T), but the variable is immediately promoted.

For example, in this code:

f<T>(T x, T y) {
  if (x is num) {
    print(x + 1);
    var z = x;
    print(z + 1);
    z = y;
  }
}

if (x is num) promotes x from type T to type T&num. This is a really important promotion, because it allows print(x + 1). Then, var z = x gives z an inferred type of T, but z is immediately promoted back to T&num. You can see that it's been promoted because print(z + 1); is allowed. And you can see that the inferred type of z is just T because z = y is allowed.

Whereas if you change the line var z = x; to T z = x;, then the promotion doesn't happen, and print(z + 1); becomes a compile-time error.

I can't find this behavior documented anywhere, but it's clearly deliberate--there's a line in flow analysis that explicitly does it. And I don't remember why we only do it for implicitly type variables--that seems bonkers to me in retrospect. If I were doing it all over again I would probably just make it apply to all variables whether they're implicitly typed or not, so that you could change var z = x; to T z = x; and the program would still work.

I suppose we could make that change as part of the patterns release, and then we wouldn't have to worry about this silly difference between typed and untyped variables anymore.

stereotype441 commented 2 years ago

You know, the more I think about this, the more I want to be permissive here (along the lines of what @lrhn and @munificent are saying in the comments). The thing that pushed me over the edge is thinking about when clauses. I could totally imagine a user wanting to do this:

switch (shape) {
  case Line(color: c):
  case Circle(color: c, attributes: a) when a.hasFuzzyEdges:
  case Rect(color: c, area: a) when a > 100:
    print('color is $c');
}

This should work, even though a gets assigned two different types in the two when clauses, and doesn't get assigned at all in the Line case, because all the places where a is used are safe.

And, I think it shouldn't be too hard for the compiler to understand that this all should work, because it can lean on the flow analysis machinery we've already built. Basically, we assign a type to each variables based on the LUB of all the variable patterns that mention it, but within a when clause, we promote each variable to a more specific type based on just the case pattern that immediately precedes it. We also mark variables as definitely assigned in each when clause based on whether they are mentioned in all branches of the corresponding case pattern. So in the example above, even though a has inferred type Object and is not mentioned in all patterns, it gets promoted and marked as definitely assigned in the appropriate places, so that a.hasFuzzyEdges and a > 100 will both work. Then, before analyzing the body, we join the flow analysis states, which un-does the promotions and marks a as no longer definitely assigned.

Integrating tightly with flow analysis probably will allow other things to "just work". Here's a somewhat contrived example:

abstract class DefinitelyInteger { int get value; }
abstract class MaybeInteger { num get value; }
switch(json) {
  case DefinitelyInteger(:value):
  case MaybeInteger(:value) when value is int:
    print(value.isEven);
}

Here, value receives an inferred type of num (the LUB of int and num), but it gets promoted in both cases (the first because the type of DefinitelyInteger.value is int; the second because of the when value is int clause. So when we join those two control flow paths, it stays promoted, and therefore value.isEven is allowed.

lrhn commented 2 years ago

If we can avoid a complicated LUB, perhaps by saying that one of the types must be a supertype of all the types, otherwise we use Object?, which will encourage putting at least one type into one of the patters, then I'd be happy with otherwise being maximally permissive.

If a name is not bound in at least one case, it's not introduced into the body. (label: and default: count as binding no names).
If a name is bound in all cases, it's introduced into the body with a type that is the stupid-LUB (max type or Object?) of the types.

eernstg commented 2 years ago

How about saying that it is a compile-time error if the set of occurrences of the variable y bound by a set of patterns is such that

There are multiple declared types, and they are not the same type (using a rather strict kind of "same"), OR
All types are inferred, and they are not the same (we'd need strict "same" here, too), OR
Some types are declared, and they are all T, and some types are inferred, S1 .. Sk, and there is a j such that Sj is not a subtype of T.

The variable y would then be treated as having an explicitly declared type if there is at least one such declared type in the patterns. (I hope it would not matter which one we choose when they are pretty strictly the same type. Otherwise we'd have to choose one of them, say, the first one.) The variable would not be promoted.

The interesting case is when there is one or more declared types and one or more inferred types. In that case we would know what the declared type is (because they are all the same) and we would then simply check that each inferred type is a subtype of the declared type. This means that a developer can rely on inferred types if they all agree. If that doesn't work then it's a compile-time error, and they know they have to do something. So they do the following: Find one location where it's convenient to write an explicit type, choose a suitable supertype of all the other inferred types, and write it there.

Look ma, no LUBs! :smile:

stereotype441 commented 2 years ago

@eernstg I have a concern with your proposal, which is that I believe sometimes people are going to bind a variable just for the purpose of using it in a when clause, and they're not going to use it in the body. Considering my first example again:

switch (shape) {
  case Line(color: c):
  case Circle(color: c, attributes: a) when a.hasFuzzyEdges:
  case Rect(color: c, area: a) when a > 100:
    print('color is $c');
}

This would have a compile error because a is bound to two different types. The user can fix the compile error by renaming one of the as to some other variable name. I think that's going to be annoying because the user is going to think of a as having a scope that's limited just to the when clause (because that's their intent). I don't think we should punish them for having different types for a if they're not going to access a in the code paths where the two types conflict.

(As a side note, I admit I don't really understand the desire to avoid using LUB in @eernstg and @lrhn's comments above. Yes, I know that our LUB algorithm isn't great and operates counterintuitively sometimes, and I would love for us to improve it if/when we have time, but until then, "LUB that's kinda smart but sometimes chooses a type that's a little too general" still seems better to me than "stupid LUB that just gives up and uses Object? in all but the simplest cases")

eernstg commented 2 years ago

bind a variable just for the purpose of using it in a when clause

In that case I think it's better for overall code readability if one of them uses a different name.

I think there are two difficulties with LUB: (1) It yields surprising and non-optimal results for some upper bounds involving distinct interface types, and (2) it's a non-trivial algorithm anyway.

We might be able to get rid of problem (1) above, but it would still be a rather complex computation, and this makes it hard to reason about the behavior of the code ("so why is y.foo() an error, I'm pretty sure the type of y is MyWidget, but it says that OtherWidget doesn't have a foo, so maybe I have to introduce an explicit type on the third case, or maybe ...").

munificent commented 2 years ago

I can't find this behavior documented anywhere, but it's clearly deliberate--there's a line in flow analysis that explicitly does it. And I don't remember why we only do it for implicitly type variables--that seems bonkers to me in retrospect.

I think the rationale (I can't tell if I'm remembering or rationalizing post hoc) is that if the user bothered to write a type, they probably want that type. And this would give them a way to not promote to the initializer's promoted type if that wasn't what they wanted.

But I don't think that rationale is compelling enough to justify having the behavior differ. I wonder if we could eliminate this special case. If not, we could at least choose to not propagate it to pattern matching. So there are two properties we need to determine:

The static type of the variable as it appears in the body, which may be shared by multiple cases.
The promoted type of that variable inside the body.

I still like the current specification that says all pattern variables with a given name must have the exact same static type (inferred or not) and then the variable inside the body has that static type.

For the promoted type, I think we could just do whatever this does:

test(Object obj) {
  if (obj is A || obj is B) {
    // What is obj promoted to here?
  }
}

Being consistent with how || works with type promotion is probably what users will expect.

Basically, we assign a type to each variables based on the LUB of all the variable patterns that mention it, but within a when clause, we promote each variable to a more specific type based on just the case pattern that immediately precedes it. We also mark variables as definitely assigned in each when clause based on whether they are mentioned in all branches of the corresponding case pattern. So in the example above, even though a has inferred type Object and is not mentioned in all patterns, it gets promoted and marked as definitely assigned in the appropriate places, so that a.hasFuzzyEdges and a > 100 will both work. Then, before analyzing the body, we join the flow analysis states, which un-does the promotions and marks a as no longer definitely assigned.

This is very clever, and by that I mean both all of the good and bad connotations of that word. :)

I think using both LUB and type promotion inside cases that share a body for overlapping variables is likely to cause more trouble than it's worth. I don't ever want to see bug reports where a user is wondering why the variable they destructured in a case has type EfficientLength. :)

bind a variable just for the purpose of using it in a when clause

In that case I think it's better for overall code readability if one of them uses a different name.

I agree with this. If we really want to allow this use case, I think a simpler rule is to say:

"When multiple cases share a body (and likewise for | patterns), any variables used in the body must have the exact same type in all cases or branches of the |."

That solves the important problem of needing a single meaningful type for that variable in the body. But it allows cases to reuse the same name for different types if the name is only mentioned in the guard clause and not used in the body. But, honestly, I think the best thing to do is to stick with the current rule and tell people to use different names if they want to have variables used in different guard clauses with different types.