dart-lang / language

Design of the Dart language
Other
2.61k stars 199 forks source link

`??=` has an oddity with inference. #3031

Open water-mizuu opened 1 year ago

water-mizuu commented 1 year ago

I honestly don't know what's happening here, but somehow, when applying ??= to a nullable variable with an inferred generic method (in this case, .fold), it infers the type to be nullable when it obviously is not. See below:


int? value;

int sum(List<int> values) {
  // A value of type 'int?' can't be returned from the function 'sum' because it has a return type of 'int'
  return value ??= values.fold(0, (a, b) => a + b);
}

/// These two are valid fixes.
int sumFix1(List<int> values) {
  return value ??= values.fold<int>(0, (a, b) => a + b);
}

int sumFix2(List<int> values) {
  int sum = values.fold(0, (a, b) => a + b);

  return value ??= sum;
}

void main() {
  List<int> values = <int>[1, 2, 3, 4, 5];

  print(sum(values));
}
lrhn commented 1 year ago

It's probably not wrong, but it is slightly annoying.

The reason value ??= values.fold(0, (a, b) => a + b) has type int? is that the context type of values.fold(0, (a, b) => a + b) is int?.

The context type of the second operand of ??= is the not specifically specified (that I can find), but it's related to the ?? operator and other composite assignment operators.

If you write int? i; i = ... then the context type of ... is int?, because that's the type of the variable being assigned to (if it was a setter, it is the parameter type of the setter).

The context type of e2 in e1 ?? e2 is the context type of e1 ?? e2, because that's where the value of e2 will go, into the surrounding context.

If you write int? i; i = i ?? ...; the the contex type of ... is the context type of i ?? ... which is int?, the type of the variable i being assigned to. And i ??= ... is basically the same as i = i ?? ...., so it's not that surprising that the context type of ... becomes int? there as well.

That comes back to bite us with return value ??= ...;. There are two contexts for the ... here, because the value is used for two things:

So, generally, for assignments (composite or not) in a context where the value is also used, will use the type of the assigned variable as context type, and then worry later about whether that type can be used in the outer context.

Is it something we can improve? There are two (or more) contexts. That happens every time we have an assignment whose value is also used. Which means any time an assignment expression has a non-_ context type.

We could use the a greatest(ish) lower bound of the context types. I'd probably only do that if one type is a subtype of the other. But then the result should be valid for all the contexts. (A kind of context promotion?)

@stereotype441

(I'm a little surprised that the context type doesn't seem to be the parameter type of the setter, for a getter/setter pair, instead it's just the type of the getter. Even when the getter has type int and the setter parameter int?.)

skylon07 commented 1 year ago

Just happened to browse here and thought I'd pitch in my two cents. Irhn gave a fantastic explanation of the compiler's reasoning behind how the "context type" of parameters is determined and how they're inferred in their relevant "contexts". I'd like to take all of that and wrap it in one simple example that illustrates this topic very succinctly.

Imagine your code,

int? value;

int sum(List<int> values) {
  return value ??= values.fold(0, (a, b) => a + b);
}

rewritten as such,

int? value;

int sum(List<int> values) {
  value ??= values.fold(0, (a, b) => a + b);
  return value;
}

This makes it a bit clearer how dart is interpreting your code. What is/should be the type of value at the point of return? Well, in the context of the function, you know it's being assigned an int, but you're storing it in an int? variable, so there's two ways the type could be inferred here. This is ultimately the problem, and I guess as of right now the compiler goes with inferring from your defined type int? because it is the most explicit inference it's able to make so far.

I have a hunch this is similar to the problem with accessing nullable class members. Imagine a scenario like this:

class StringPrinter {
  String? s;

  StringPrinter(this.s);

  void printLength() {
    if (s != null) {
      print(s.length); // errors
    }
  }
}

The compiler will complain that "The property 'length' can't be unconditionally accessed because the receiver can be 'null'. Try making the access conditional (using '?.') or adding a null check to the target ('!')." Basically, this means that even though you've checked that s is not null, it's possible (because s is technically a "property getter") that s won't refer to something non-null next time you access it. Perhaps some reason similar to this is why int? value isn't being promoted to an int type in the context of sum(). Totally just a guess on my part though.

Hopefully this explanation makes sense to you (and that it's accurate, heh).

eernstg commented 1 year ago

I think the treatment of the context type of an assignment is relevant here. For example:

X showContextType<X>(Object? o) {
  print(X);
  return o as X;
}

void main() {
  dynamic d;
  int i = d = showContextType(0); // Prints 'dynamic'.
}

As we can see, the right hand side of the assignment d = showContextType(0) gets the context type dynamic, in spite of the fact that the type of the assignment as a whole is the type of its right hand side, and the assignment as a whole has context type int.

So we're getting a less precise static typing because we allow d to turn showContextType(0) into an expression of type dynamic, in spite of the fact that we are also going to use that expression to initialize i, and it could have had type int.

You could say that this is unnecessary, we should just reorder the code and use int i; dynamic d = i = showContextType(0);. However, this is not always possible. For instance, return e; cannot be refactored to return a value first and then perform an assignment, and hence this kind of refactoring wouldn't help us in the original sum example.

We could allow the context type schema of the assignment (here: int) to play a role in the construction of the context type schema for its right hand side, especially in the case where it is a more specific type schema than the context type schema obtained from the left hand side of the assignment. In the example, we would give showContextType(0) the context type int rather than dynamic because int <: dynamic.

In general, we could consider using a context type schema for e in a = e which is the greatest lower bound of the context type schemas obtained from the assignable expression a and from the context of the assignment as a whole.

@skylon07, you mention the fact that value in the original example is a top-level variable, and hence it is not subject to promotion. That's true, and it is usually a very useful observation. But it doesn't actually make a difference in this particular case because the semantics of a ??= e already requires the assignable expression to be evaluated just once.

So let's consider an example where we use a local variable to ensure that we faithfully reproduce the semantics of ??= in the sum example, and then use a helper function to get the context type that I proposed above:

int intId(int i) => i; // Used to give `fold` the context type `int`.

int sum(List<int> values) {
  var v = value;
  return v != null ? v : value = intId(values.fold(0, (a, b) => a + b));
}

This illustrates that we could use the improved context type schema for assignments (here: GLB(int?, int) == int), and we could update the context type schema used with ??= accordingly, and the sum example would then work.

@stereotype441, WDYT? This would be a breaking change. Also, I mentioned some operations on type schemas that may only be well defined for types (for instance, taking the greatest lower bound). Do you see that as a significant issue?

On the other hand, we're talking about changing a context type schema to a subtype schema, and doing that only in the case where an assignment occurs as a subterm of another expression. Those scenarios might not be very breaking in practice. If it isn't too breaking then it seems to be doing "the right thing", and the sum example shows that it can occur in practice.

stereotype441 commented 1 year ago

@lrhn said:

We could use the a greatest(ish) lower bound of the context types. I'd probably only do that if one type is a subtype of the other. But then the result should be valid for all the contexts. (A kind of context promotion?)

@eernstg made a similar suggestion:

In general, we could consider using a context type schema for e in a = e which is the greatest lower bound of the context type schemas obtained from the assignable expression a and from the context of the assignment as a whole.

...

@stereotype441, WDYT? This would be a breaking change. Also, I mentioned some operations on type schemas that may only be well defined for types (for instance, taking the greatest lower bound). Do you see that as a significant issue?

Greatest lower bound is fully defined for type schemas. The definition is here: https://github.com/dart-lang/language/blob/main/resources/type-system/inference.md#upper-bound. So no worries about this change not being well-defined. (Thanks @leafpetersen for pointing this out to me when I couldn't find it back in November 😃)

I don't see any issue with these proposals. Personally I would probably go with Erik's suggestion of always using greatest lower bound (rather than Lasse's more conservative suggestion of only doing it when one type context is a subtype of the other), just because it feels more principled to me. I'm guessing the reason for Lasse's more conservative suggestion is because our GLB algorithm isn't perfect, and doesn't actually produce the greatest bound. But I suspect that it won't come up very often, and besides, since context types are only a suggestion, even if the context is occasionally more specific than it should be, it probably will be extremely rare for the error to actually create a problem.

It certainly would be worth prototyping the change and running it through the internal codebase to see if it breaks anything!

lrhn commented 1 year ago

Yes, I'm very worried about what GLB can come up with. More than LUB.

If it becomes Never, then we're worse off than today. It definitely won't compile, which it would today if we lucked on choosing the most specific type, and the error message will be useless. ('A value of type 'intcannot be assigned toNever, but there is noNever` type anywhere in the code. Users are baffled.)

Just consider what would happen with List<int>? l1; List<num> l2 = l1 ??= <int>[1];. If UP is anything to go by, it won't succeed in combining List<num> and List<int>, and will choose, well, the only known lower bound is Never.

stereotype441 commented 1 year ago

Yes, I'm very worried about what GLB can come up with. More than LUB.

If it becomes Never, then we're worse off than today. It definitely won't compile, which it would today if we lucked on choosing the most specific type, and the error message will be useless. ('A value of type 'intcannot be assigned toNever, but there is noNever` type anywhere in the code. Users are baffled.)

Just consider what would happen with List<int>? l1; List<num> l2 = l1 ??= <int>[1];. If UP is anything to go by, it won't succeed in combining List<num> and List<int>, and will choose, well, the only known lower bound is Never.

Right, but since type schemas are only suggestions, not required types, this would not be a problem at all in your example, because there are no types to be inferred. The type of the list literal <int>[1] is List<int> regardless of context, and there would be no error.

Or am I missing something?

lrhn commented 1 year ago

No, that example does work. We need inference to use the context type for something. Something like:

List<num> x;
List<int> y;
x = y = [1, 2, 3].fold([], (x, y) => x..add(y));

A little harder to get to, but still doable.

It type-checks today because we infer fold<List<int>> which returns List<int> which is assignable to List<num>.

If we tried to GLB List<int> and List<num>, the two context that the result flow into, we'd likely get Never, and because context types takes precedence during top down inference, we'd infer fold<Never>([], (Never x, int y) => x.add(y)) and fail completely.

Context type schemas are only suggestions during bottom-up inferencing, we don't complain if the static type doesn't match the context type (in most cases), but they are dictates during top-down inference.

eernstg commented 1 year ago

I think we have two separate choices on the table here:

The main point I was making was that we should focus on plain assignments. That is, we should allow the context type schema of e to depend on the local-variable-type-or-setter-parameter-type of x and y in x = y = e, and similarly for foo(y = e), and any other syntactic context where an assignment expression has a context type schema. It just seems unnecessarily lossy to ignore the parameter type of foo when foo(y = e) is analyzed and e is equipped with a context type schema. The point is that y = e yields the value of e anyway, and hence e should bend to the wishes of the context just as much as it would for foo(e).

If we have that then the treatment of x ??= e would change accordingly because that's essentially syntactic sugar for let v = x in v == null ? x = e : v. (OK, it's not specified as syntactic sugar, we just spell out the same thing in natural language.) Based on the desugaring perspective we would then have a context type schema S for x ??= e, and that would be the context type schema for the let, then for the first branch x = e, and finally it would contribute to the context type schema for e, just based on the updated treatment of the plain assignment.

Other forms like e1?.v ??= e2 and e1[e2] op= e3 would get a similar enhancement based on similar reasoning.

So that was the first question. The second one is about using GLB or something more conservative. I tend to prefer using GLB, but it would of course be nice to have some concrete data about the breakage, if any.

Note that this example would continue to work, because GLB(List<int>, List<num>) == List<int>:

List<num> x;
List<int> y;
x = y = [1, 2, 3].fold([], (x, y) => x..add(y));

but this one would fail (because GLB(List<num>, Iterable<int>) == Never, even though it could be List<int>):

List<num> x;
Iterable<int> y;
x = y = [1, 2, 3].fold([], (x, y) => x..add(y));

However, this example fails already today because fold gets the type argument Iterable<int>, which is not assignable to the type of x.