dart-lang / language

Design of the Dart language
Other
2.64k stars 198 forks source link

Improve context schema rules for `await` #3571

Open stereotype441 opened 7 months ago

stereotype441 commented 7 months ago

Imagine we have an asynchronous function that may or may not produce a result:

// Yields `null` if user name is not known
Future<String?> readUserNameFromDatabase<T>() async {
  UserEntry userEntry = await readUserEntryFromDatabase();
  return userEntry.nameOrNull;
}

And we wrap it in an asynchronous function that supplies a default value:

Future<String> getUserNameOrDefault() async {
  return (await readUserNameFromDatabase()) ?? 'No data found';
}

Now imagine we want to make the original function more generic, e.g.:

Future<T> readUserDataFromDatabase<T>(
    T Function(UserEntry) extractDataFromEntry) async {
  UserEntry userEntry = await readUserEntryFromDatabase();
  return extractDataFromEntry(userEntry);
}

Future<String> getUserNameOrDefault() async {
  return (await readUserDataFromDatabase((userEntry) => userEntry.nameOrNull)) ??
      'Unknown name';
}

This no longer works, and it's not obvious why from the error messages. Here are the error messages from the CFE:

../../tmp/proj/test.dart:8:11: Warning: Operand of null-aware operation '??' has type 'String' which excludes null.
  return (await readUserDataFromDatabase((userEntry) => userEntry.nameOrNull)) ??
          ^
../../tmp/proj/test.dart:8:67: Error: A value of type 'String?' can't be returned from a function with return type 'String' because 'String?' is nullable and 'String' isn't.
  return (await readUserDataFromDatabase((userEntry) => userEntry.nameOrNull)) ??
                                                                  ^

It turns out that what's happening is this:

*Note that the behavior of star types is not really important anymore, because pre-null-safety code is no longer supported; but support for star types hasn't been removed from the CFE yet, so I'm including the full rule for completeness.

I would like to modify the context schema rules for await to this:

Once again I'm including information about star types for completeness.

This would allow the code example above to be accepted, because readUserDataFromDatabase(...) would be analyzed using a context schema of FutureOr<String?>.

Note that the only difference is in the behavior if K is FutureOr<T>? (or FutureOr<T>*). I think the change is justified, because in this situation, the await e expression is in a context that permits null, therefore it makes sense that the context for e should permit a future that completes with null.

As with any change to type inference, this would be a breaking change in theory, because any change to an inferred type could lead to cascading changes to other inferred types, which could in turn lead to a static error in code that would previously have been accepted. However, I've prototyped the change in google3 (in the analyzer only) and found no breakages. So I think the risk of breaking real-world code is low.

An additional reason we might want to consider this change is that if we ever decide to go ahead with https://github.com/dart-lang/language/issues/3471 (require every expression to satisfy its context), then we will need to revise the context schema rules for await in this way, in order to avoid serious breakage.

I do think that if we make this change, it would be reasonble to do it in a language-versioned way, so that after the change, if anyone writes code like the example above, they won't risk publishing it to pub in a form that will break with older versions of Dart.

@dart-lang/language-team what do you think?

eernstg commented 7 months ago

Sounds good!

Here's one perspective that supports the claim that it is a relaxation (allowing more after the change than before, in a narrow sense): The context FutureOr<T>? allows Null, T, and Future<T> (starting with Future<T>?, but the ? is redundant); the context FutureOr<T?> allows Null, T, and Future<T?>.

Soundness: It is always the operand of await, which means that we will have the same set of objects after awaiting (we have one more way to get null, but the context was already handling null, so that won't break anything).

Ship it! :smile:

lrhn commented 7 months ago

The

  • If K is FutureOr<T>?, then J is FutureOr<T?>.

does look unsound. The J context type allows evaluating to a Future<T?>, which is not a subtype of FutureOr<T>?. But, as Erik says, since it only happens inside await, it's sound because flatten(FutureOr<T>?) = T? = flatten(FutureOr<T?>), so after awaiting, we can't tell the difference.

I'm still not particularly happy about loosening the typing rules for await.

My preferred rule change would be J = Future<K>. That is, if the context of await e is K, the context of e is Future<K>. Then we would still accept any type T for e where flatten(T) is assignable to K, but we wouldn't encourage it by giving a context type that's not a Future type.

The typing rules around await are incredibly complicated and somewhat error-prone, because of the introduced FutureOr. Just removing that would simplify things a lot. (If the static type of T is dynamic, we'd have to decide whether to downcast to Future<K> first, or await any future first, then downcast to K. The latter is probably the best choice, since nobody should have nested futures. And it's effectively what we do today too.)

(It wouldn't solve the issue with the original post's code, though.)

leafpetersen commented 7 months ago

I'm a little nervous about this proposal, because it pushes on an area where we are constrained by the limited set of union/intersection types we support. In general, unions and intersections may support a rich algebra (think e.g. along the lines of De Morgan's laws), and they may be defined to "distribute" over terms in rich ways. So for example, given a thing of type (int, int) & (String, String) you might expect to be able to treat it as a thing of type (int & String, int &String) and project out the first component to get something of type int & String. In Dart however, we only support a very limited set of union and intersection types, and so there are many types which are sensible in an underlying union/intersection type model which we can't express. So for example, the type int | Future<int?> is a sensible union type, but it cannot be faithfully represented via the FutureOr constructor.

With await and FutureOr, we're pushing a bit on this: await is an operation which distributes across unions in certain ways, and we can't always accurately capture the types that are involved using the building blocks available to us. If we analyze this example in terms of the underlying union types, we can see how this plays out.

We are considering the case where we have await e in a context which provides a context type that looks like FutureOr<String>?. If we expand that out in terms of the underlying union types, then what the context type says is that the expression is expected to return something of type Future<String> | String | Null. So now we can ask the question: "what are the types of e such that await e can validly produce one of the above"? Because of reified types, we do not have the property that (e.g.) Future<String> | Future<Null> is equivalent to Future<String | Null>, and so the list is fairly long. Breaking it down into a few subsets, we have:

Note that each of these types is expressible given our current set of combinators. However, the composition of them (which is, in some sense the "correct" context type for e) is not expressible:

Future<FutureOr<String>?> | Future<FutureOr<String>> | Future<Future<String>?> | Future<Future<String>> | Future<String?> | Future<String> | String? | String | Future<Null> | Null

We can re-distribute the union types in the above without losing information by taking the following steps:

so by applying the above rewrites (not unique, unfortunately, but without losing information) we get to:

Future<FutureOr<String>?> | Future<FutureOr<String>> | Future<Future<String>?> | FutureOr<Future<String>> | FutureOr<String?> | Future<Null>

We can further simplify the above by observing that some of the arms of the union are subtypes of others, and hence can be folded in:

So a context type which covers the full set of possible expressions is:

Future<FutureOr<String>?> | FutureOr<String?>

Note that this cannot be simplified further without losing information (e.g. using Object? is a valid supertype of both arms, but we lose information if we use that as the context) since String? is not a subtype of Future<FutureOr<String>?> (so the second arm isn't a subtype of the first); and since Future<String> is not a subtype of String? (and hence the first arm isn't a subtype of the second).

Note also that this is not a unique decomposition: FutureOr<String?> is equivalent to Future<String?> | String?, and Future<String?> is a subtype of Future<FutureOr<String>?>, so we could alternatively decompose this into:

Future<FutureOr<String>?> | String?

I don't immediately see any way to do better here without losing information, and I don't see a good lossy supertype to use either. Based on this analysis then, one could imagine simply choosing one of the different arms of the simplified unions above to use as the context type:

The last seems clearly not useful. The second is what is being proposed here. How do these compare? Well, if we start from the full set of types that we enumerated above as the possible static types for e (which can be written as the union type Future<FutureOr<String>?> | Future<FutureOr<String>> | Future<Future<String>?> | Future<Future<String>> | Future<String?> | Future<String> | String? | String | Future<Null> | Null), we can ask for each of the possible context types, which of these types is incompatible with that context type? So for example, if we were to use String? as the context type, then when e had type Future<String> it would conflict with the context type. This is relevant to the proposal mentioned above to make context types required, in which case presumably some of these would become static errors. In the current implementation, I think this is mostly relevant to for inference: a choice of incompatible context is likely to be not useful for inference.

The current implementation uses FutureOr<String>? as the context type (I don't have a derivation at hand for why that was chosen). Of the list of possible valid types for e above, the following are not subtype compatible with that context type:

That is, FutureOr<String>? x = await e is semantically sensible code when e has any of the above listed types, but if we were to require that e have a type which was a subtype of the context type, then all of the above cases would be rejected.

The proposal here is to use FutureOr<String?> as the context type, which has some motivation in the derivation above. It essentially corresponds to accepting that we can't represent both of the union branches in the simplified union above, and somewhat arbitrarily picking the second. If we make this choice, then the following valid types for e are not subtype compatible with that context type:

If I've done this analysis correctly then, it seems that the proposed context type makes is strictly better than our current choice based on this criteria. Note that it is not clear to me that this is the only criteria we need to consider (e.g. we should think about how this might interact with inference).

Finally, it's worth looking at the other option suggested by the analysis above, which is to arbitrarily choose the first arm of the simplified union type above for the context type, that is Future<FutureOr<String>?> . If we make this choice, then the following valid types for e are not subtype compatible with that context type:

Interestingly, this choice covers all of the cases where e has a future type, but rejects the cases where e is a non-future, which is arguably compatible with the expressed desire to only allow awaiting futures.

lrhn commented 7 months ago

Thinking more, this feels like a self inflicted problem.

The reason we get into this issue is that we have a special rule for a context type that is already Future or FutureOr. If we didn't have that, the context type of the await operand would be FutureOr<FutureOr<String>?> which allows both Future<String> and Future<String?>. Pretty much any combination with only one Future, and nobody wants nested futures. And not wanting nested futures is why we have that special case to begin with.

So, the speciale case rule is why we have a problem, so maybe it's just not good enough. (Leaf's derivation shows it's not.) The FutureOr<String>? is an approximation of the correct result, which is FutureOr<FutureOr<String>?>, to only include one Future level. It fails to cover all single-Future-level subtypes of FutureOr<FutureOr<String>?> by not covering Future<String?>. So it's not a good enough approximation.

If we want to keep trying to find a sufficient approximation then:

Given await e with context scheme C, we want to introduce a context type C2 for e, s.t.,

It's not a given that it is always possible to find such a C2.

It's almost certain that C2 will have the form FutureOr<F> for some F. Let's assume that for now, and work with F (aka flatten(C2)) instead . Then the constraints above become:

And let's look at the four two-level nested Future-types above:

(Where we notice that both is really F = flatten(C).)

How do those compare to what we have today:

  • If K is FutureOr<T>, FutureOr<T>?, --or FutureOr*--, then J is K.

where K is C and J is C2 = FutureOr<F>. The difference is precisely that an outer context type of FutureOr<T>? will give an inner context type, C2, of FutureOr<T>? with the existing rules, and FutureOr<T?> with the new rule above.

The underlying idea is that if you're awaiting something, that await should not evaluate to a future. So if you're awaiting something with a FutureOr context, you don't actually expect to see the Future branch. For a FutureOr<int>? context, that means we expect to see an int or null. So it makes sense that the context type becomes FutureOr<int?>.

That is, I think there is a valid reason, derived from desired constraints on the special-case rule that we have introduced, to choose FutureOr<T?>. It's not (entirely) ad-hoc. (It is a little, because we only try to avoid FutureOr<FutureOr<..>> types, and not FutureOr<Future<...>> which is just as much a nested future type. It's just that we can't find a sound approximation better than FutureOr<Future<...>>, even if we wanted to.)

Which also means it won't necessarily generalize to other union types, if we had general union types. We have FutureOr<flatten(FutureOr<T>)> \<: FutureOr<FuturOr<T>>. Maybe we can generalize to using SomeUnion<flatten(SomeUnion<T>)> as an approximation of FutureOr<SomeUnion<T>> when FutureOr<T> \<: SomeUnion<T>. (Completely guessing here.)

(But if we have general union types, we need to apply flatten to those too, so maybe the answer will always be FutureOr<F> where F = flatten(C). Because the one rule we have for flatten is that T \<: FutureOr\<flatten(T)>.)

eernstg commented 7 months ago

Perhaps we should restart this discussion based on the assumption that we adopt #870. The main point of doing that is that it makes type inference simpler in async function bodies, and that's exactly what we need here! ;-)

// --- Glue code, needed to make the example compilable.

class UserEntry {
  String? get nameOrNull => null;
}

Future<UserEntry> readUserEntryFromDatabase() async => UserEntry();

// --- Example from issue. Emulate #870.

Future<T> readUserDataFromDatabase<T>(
    T Function(UserEntry) extractDataFromEntry) async {
  UserEntry userEntry = await readUserEntryFromDatabase();
  T result = extractDataFromEntry(userEntry);
  return result;
}

Future<String> getUserNameOrDefault() async {
  String result =
      (await readUserDataFromDatabase((userEntry) => userEntry.nameOrNull)) ??
          'Unknown name';
  return result;
}

No nested future types anywhere!

stereotype441 commented 7 months ago

I like the principled nature of @leafpetersen's analysis in https://github.com/dart-lang/language/issues/3571#issuecomment-1899745967. When I was trying to write up the issue in the first place, I think I was trying to mentally reach for something like this, but I kept getting lost in the details and eventually gave up. Reading Leaf's comment makes me want to try again.

But I'm going to throw a monkey wrench into it: I'm going to posit that in practice, for real-world Dart code, developers never intend for an await expression to produce a future, and therefore, since context schemas are all about trying to push type inference into coming up with types that match user intent, I'm going to assume that whatever reasoning we use to come up with a context rule for await, we don't need to allow for the possibility that the type of the await expression is Future<T> for any T.

Repeating Leaf's analysis with that assumption:

We are considering the case where we have await e in a context which provides a context type that looks like FutureOr<String>?. If we expand that out in terms of the underlying union types, then what the context type says is that the expression is expected to return something of type Future<String> | String | Null. So now we can ask the question: "what are the types of e such that await e can validly produce one of the above"? Because of reified types, we do not have the property that (e.g.) Future<String> | Future<Null> is equivalent to Future<String | Null>, and so the list is fairly long. Breaking it down into a few subsets, we have:

  • The types for e such that await e can produce Future<String> (as well as possibly other things):

    • Future<Future<String> | String | Null> aka Future<FutureOr<String>?>
    • Future<Future<String> | String> aka Future<FutureOr<String>>
    • Future<Future<String> | Null> aka Future<Future<String>?>
    • Future<Future<String>>
  • The types for e such that await e can produce String (but not Future<String>):

    • Future<String | Null> aka `Future<String?>
    • Future<String>
    • String | Null aka String?
    • String
  • The types for e such that await e can produce Null (but not String or Future<String>):

    • Future<Null>
    • Null

Taking the union of these types, we have Future<String?> | Future<String> | String? | String | Future<Null> | Null.

Applying the same rewrites that Leaf used, we are left with FutureOr<String?> | FutureOr<String> | Future<Null>

Future<Null> is a subtype of FutureOr<String?> so we can drop it. Same with FutureOr<String>. So we are left with FutureOr<String?>. Which matches my initial proposal.

So if you accept my assumption (that our context for await doesn't need to account for the possibility that await e ever produces a future), I think my initial proposal can be put on a principled footing.

lrhn commented 7 months ago

I agree. The point is to not have a future after await. That means that a context type of FutureOr<T>? for an await can be reduced to T?, which removes the Future part of the FutureOr, leaving just the T, inside the ?. Which happens to always be the flatten of the context type, but that might just be because there are only so many ways to combine T, Future and ?.

But, if we want to be consistent, should we recursively process deeper nestings, like a context type of FutureOr<FutureOr<T>?> also brought reduced to T??

stereotype441 commented 7 months ago

@lrhn:

But, if we want to be consistent, should we recursively process deeper nestings, like a context type of FutureOr<FutureOr<T>?> also brought reduced to T??

Yeah, I suppose we could do that. In fact, perhaps a principled way to think of this is, to perform type inference on an expression of the form await e, in context K, we first throw out any parts of K that are futures (because at runtime await e will never* evaluate to a future), then we apply FutureOr to the result, and perform appropriate simplifications; that gives us the appropriate context for performing type inference on e.

* I'm not 100% sure this is true. It's certainly true for well-behaved code though.

Throwing out any parts of K that are futures can be done using the factor operation (defined in https://github.com/dart-lang/language/blob/main/resources/type-system/flow-analysis.md#promotion), assuming we extend it suitably to apply to type schemas. The relevant rules are:

Then, I think the only simplification we need to do is to reduce FutureOr<T> to _ when T is a top type (or _).

So the complete rule would be: to perform type inference on an expression of the form await e, in context K, we perform type inference on e in context J = makeFutureOrSchema(factor(K, Future<dynamic>)), where makeFutureOrSchema is defined as:

That would give the same results as my initial proposal, except that:

All 3 of these bullets seem like improvements to me.

lrhn commented 5 months ago

(because at runtime await e will never* evaluate to a future), then we apply FutureOr to the result, and perform appropriate simplifications; that gives us the appropriate context for performing type inference on e.

* I'm not 100% sure this is true. It's certainly true for well-behaved code though.

It's not true in general, but is true if the code never contains any Future whose completed value is also a Future. That's why flatten only removes one layer of futures. It must be sound and approximate the value of await, and await only awaits one future.

The context doesn't need to be "sound", as long as we don't strictly enforce it. It just has to be helpful.

My initial proposal said that if K was some top type other than dynamic or _, J would be FutureOr<K>; now J is _.

Should we generally consider a top-type as context type to be _. It poses no constraint. (Not even if void.)

We may want to allow Future<void> foo(FutureOr Function() f) async => (await Future.sync(f)..addToLog());. Currently it risks inferring Future<void>.sync, which makes the cascade invalid, but something being assignable to void doesn't mean it is itself not a useful value.

My initial proposal said that if K was Future<T> for some T, then J would be FutureOr<Future<T>>; now J is FutureOr<Never>.

Not sure that's an improvement. (Also not sure it isn't. Can't really wrap my head around it.) If the result of the await should definitely be a Future<T>, then a context type of FutureOr<Future<T> is correct. Maybe Future<Future<T>> is even better.

Come to think of it, maybe the context type for an await's operand expression should be a Future type, not a FutureOr type. If we don't try to be complete (we don't), we don't even need the non-Future part of the FutureOr. What we want is a Future<T> that satisfies flatten(Future<T>) = T \<: K.

In that case, remove all FutureOrs from the context type, and wrap it in Future. Then a Future<int> f = await e; would give e a context type of Future<Future<int>>, which is the kind of future required to satisfy the context. A FutureOr<int> f = await e; would get Future<int> as context type. A FutureOr<Future<FutureOr<int>?>? f = await e; should get a Future<Future<int?>>>?. (Retain the outer ?, because someone might reasonably await a "future or null". And being nullable is better integrated with promotion that FutureOr.