dart-lang / language

Design of the Dart language
Other
2.67k stars 205 forks source link

Downward type inference does not consider possible nullable solution when matching `X?` #3269

Open eernstg opened 1 year ago

eernstg commented 1 year ago

Consider the following program:

X? f<X>(X x) => x;

void main() {
  int? i;
  i = f(i); // Compile-time error.
  i = f<int?>(i); // OK.
}

This program has a compile-time error at the first invocation of f: The context type is int?, int? is matched against X?, and it is concluded that X == int. Consequently, it is a compile-time error to pass i as the actual argument.

However, the second invocation of f passes the type argument int? explicitly, and everything now type checks correctly.

The point is that in the situation where a context type of the form T? is matched against a type of the form X? where X is a type variable which is subject to type inference, we could choose to say that X must be a subtype of T, or that X must be a subtype of T?. Currently we only do the former. Presumably we can't do both, but X <: T? includes all solutions to X <: T, so maybe we can just say that X must be a subtype of T?? Inconveniently, this might imply that some types which are currently inferred as non-nullable will be inferred as nullable after the change.

@dart-lang/language-team, WDYT?

lrhn commented 1 year ago

The problem goes in both directions.

X? apply<X>(X? Function(X) f, X v) => f(v);

void main() {
  int? maybeInt = 2 as dynamic;
  int? i = apply((int? x) => x, maybeInt);
  int? j = apply((int x) => x, 1);
}

If we let downwards inference choose X as int, like today, the i line fails and the j line works. If we let it choose X as int?, the i line works, and the j line fails.

The problem is that the type argument variable can occur both covariantly and contravariantly in the arguments, but we choose the type during downdwards inference after only looking at the return type, and we optimize for the most precise type possible, which favors contravariant occurrences in the parameters. While we then know that we satisfy the return type, we don't actually know whether we'll satisfy the arguments, and at that point, we're afraid to ask.

The only change that will make this categorically better, is to not lock down the type parameter during downwards inference, but only use it to make a guess, and use that guess as a hint for the arguments, providing the typing context for the argument expressions. (And horizontal inference can even improve the hint for later arguments.)

Then, after every argument expression has been typed, we try to solve for the actual type argument. At that point we can stil favor the downwards inferred type, if it's valid for the arguments too, but we can choose something else if it isn't. Then we make sure that the argument types are valid for our choice, and hope that the return type will be valid (but if the return type tying context was also a hint, maybe it's OK that it isn't).

eernstg commented 1 year ago

Here is one concrete change that we might consider in response to this discussion. We have this rule:

  • If Q is Q0? the match holds under constraint set C:
    • If P is P0? and P0 is a subtype match for Q0 under constraint set C.
    • / other subcases /

We could then update the first subcase as follows, in the case where the type variables that we're solving for occur in P0:

  • If Q is Q0? the match holds under constraint set C:
    • If P is P0? and P0 is a subtype match for Q0? under constraint set C.

The point is that if C ensures that P0 <: Q0? then it is also true that P0? <: Q0?, and this allows us to find a solution in the examples mentioned above.

However, it might well have pragmatically worse behavior, because it could find the value T? for a type variable which plays the role as P0, and the current rule would make it T, and that might be a preferable typing (because the program then has fewer nullable types overall).

So the trade-off seems to be that we infer "better" types (with fewer occurrences of Null) with the current rule, but the current rule rejects some programs where there is a solution; and the proposed new rule has the opposite trade-off.

fishythefish commented 1 year ago

If there's no issue with covariant vs. contravariant occurrences and the tradeoff is simply between "better" types and accepting valid programs, then IMO we should prefer to accept valid programs. Users who want tighter types can and should provide annotations. In my experience, the "most general type" rule works well for other type systems, e.g. Hindley-Milner (although such type systems can infer polymorphic types, so the meaning of "general" is a bit different).

Besides, dynamic is arguably as "bad" as a type can get, but in a vacuum, we infer (dynamic) => dynamic for the function (x) => x.

eernstg commented 1 year ago

If ... the tradeoff is simply between "better" types and accepting valid programs, then IMO we should prefer to accept valid programs

Interesting! I think it's important to try to come up with design rules like that, and I'll keep this one in mind.

However, there's a built-in trade off with precision: If we always infer dynamic then we have probably optimized for accepting the largest possible set of programs, including a lot of valid ones. ;-)

So is it fair to say that choosing X == int? when matching X? with int? lowers the precision? You could actually say that it improves on the precision of the typing of f in the invocation f(i), because it reveals that f is capable of accepting an int? argument and still return an int?. But, arguably, it lowers the precision of g in the following example, in the sense that it gives rise to the creation of an <int?>[1] rather than an <int>[1], and the int? type argument could cause the work on that list to be more cumbersome later on, elsewhere:

X? g<X>(List<X> xs) => xs.isEmpty ? null : xs.first;

void main() {
  int? i;
  i = g([1]); // Infers `[1]` as `<int>[1]` today, would infer `<int?>[1]` if `X == int?`.
}

Also note that the maintainer of the function has the opportunity to influence the situation:

X? f1<X>(X x) => x; // Original example. Just pretend that the body could return null.
X? f2<X extends Object>(X x) => x; // "We must know that null came from `f2`, not from the argument!"
X? f3<X>(X? x) => x; // "Null is allowed everywhere, we don't care!"
X f4<X>(X x) => x; // `X` can be nullable, but "No garbage in means no garbage out!"

Perhaps it's just bad style to have a function with the type X? Function<X>(X) in the first place, and we should use one of the others?

stereotype441 commented 1 year ago

Perhaps it's just bad style to have a function with the type X? Function<X>(X) in the first place, and we should use one of the others?

Yeah, maybe. My personal experience is that whenever I write X? in a signature (where X refers to a generic type variable), I'm claiming the null sentinel value for my own purposes, and I don't want a caller to run into confusion by substituting a nullable type for X (in a sense, claiming the null sentinel value for themselves). So I tend to write <X extends Object> as a way of enforcing that.

But I feel like the choice of signature is about more than just style, because signatures have to match program behaviour, and they have a huge influence on what clients can do; they're really about function more than form. Sometimes, you have no choice but to share the sentinel value with the caller, but you find a way to be careful so that you get away with it. A great example of this is Map.operator[], whose return type is V?. If it returns null, does that mean the map lookup failed to find an entry? If the caller knows that V is non-nullable, then yes! Otherwise, the caller has to do something like call containsKey to figure out what happened. Which is too bad, because it's a bit of a footgun; it's easy to forget to call containsKey and create a subtle bug. But I would be hard pressed to say that it was "bad style" to give Map.operator[] the signature we did. The signature we gave it is a genuine and true reflection of its behaviour. Was it a bad choice for Map.operator[] to have the behaviour it has? We could have made it throw an exception if no entry is found (as Python does), but that has serious performance and usability implications. We could have designed the language with option types (like the Maybe type in Haskell), but there's huge design tradeoffs there too. Or we could have defined the Map type as Map<K, V extends Object>, but that would have prohibited a lot of really useful maps (e.g. the maps in JSON).

I think the unfortunate fact is that sometimes X? Function<X>(X) really is the signature the user wants, because it's an accurate reflection of the program's behaviour. Maybe we could have a lint to warn the user that they're probably creating a footgun, but the existence of Map.operator[] suggests to me that there might be a lot of annoying false positives.

Returning to Erik's original example:

X? f<X>(X x) => x;

void main() {
  int? i;
  i = f(i); // Compile-time error.
  i = f<int?>(i); // OK.
}

I mentioned this at the language meeting, but repeating myself for the benefit of those who weren't there: I actually kind of like the fact that the user gets a compile-time error here. IMHO, one of the most important purposes of the type system is to accept bug-free programs and reject buggy ones, and a user who writes i = f(i) is probably writing buggy code, because they're probably not thinking about the fact that f has claimed the sentinel value of null for its own purposes.

If they insted write:

  if (i == null) {
    // Handle the `null` case
  } else {
    i = f(i);
  }

then the compile error goes away thanks to type promotion. Which is great, because now we don't have any collision of null sentinels.

And I also think it's fine that

  i = f<int?>(i);

doesn't lead to a compile error. Because by explicitly writing the type argument <int?>, the user has made it clear that they've looked at the signature of f and (hopefully) understand the implications of what they're doing.

eernstg commented 1 year ago

Great analysis, @stereotype441!

PS: By 'style' I meant "software design", not "appearance". That's also the reason why I wrote "We must know that null came from f2, not from the argument!" in a comment, emphasizing exactly the semantic property that the function f2 claims the sentinel for itself based on X extends Object.

fishythefish commented 1 year ago

I actually kind of like the fact that the user gets a compile-time error here. IMHO, one of the most important purposes of the type system is to accept bug-free programs and reject buggy ones, and a user who writes i = f(i) is probably writing buggy code, because they're probably not thinking about the fact that f has claimed the sentinel value of null for its own purposes.

Ooh, I like this interpretation. Even though we could solve for the type variable without incurring a compile-time error, that solution is actually invalid because it accepts a buggy program. In which case, the tradeoff discussion is basically moot.

Because by explicitly writing the type argument <int?>, the user has made it clear that they've looked at the signature of f and (hopefully) understand the implications of what they're doing.

I'm pessimistic about this. Maybe a different diagnostic message could make a difference here, but my experience has been that people running into this issue (myself included) essentially view this as a weird quirk of Dart's type inference without any obvious rationale. I got as far as Erik did in the opening comment when I reasoned about it; without this thread, I might not have noticed the latent bug (though I will certainly try to point it out to others now that I know!). I think others are likely to "fix" their code by adding more explicit types and observing that <int?> happens to make the error go away.

Moreover, I think the right fix in a lot of cases is actually X extends Object rather than an explicit nullable type argument, as in this example from @iinozemtsev. Despite the potential false positives, maybe a lint warning on X? when X is potentially nullable would help here - value itself would be flagged.

eernstg commented 1 year ago

that solution is actually invalid because it accepts a buggy program

But how do we know? There is always the possibility that someone comes up with a brilliant design pattern based on an otherwise rarely used combination of features. Another consideration could be that if X? Function<X>(X) is inherently a buggy function type then we should emit a diagnostic message about this problem at the function declaration. The call site can't really be held accountable for declaration of the callee...

maybe a lint warning on X? when X is potentially nullable

That's an interesting one!