dart-lang / language

Design of the Dart language
Other
2.64k stars 198 forks source link

Analyzer and front end have different (and dubious) interpretations of type tests in pattern spec #3586

Open stereotype441 opened 6 months ago

stereotype441 commented 6 months ago

The patterns spec contains the following text (emphasis mine):

From the Pattern context type schema section:

The context type schema for a pattern p is:

...

  • List: A context type schema List<E> where:

    1. If p has a type argument, then E is the type argument.

    2. Else if p has no elements then E is _.

    3. Else, infer the type schema from the elements:

      1. Let es be an empty list of type schemas.

      2. For each element e in p:

        1. If e is a matching rest element with subpattern s and the context type schema of s is an Iterable<T> for some type schema T, then add T to es.

...

From the Type checking and pattern required type section:

To type check a pattern p being matched against a value of type M:

...

  • List:

    1. Calculate the value's element type E:

      1. If p has a type argument T, then E is the type T.

      2. Else if M implements List<T> for some T then E is T.

      3. Else if M is dynamic then E is dynamic.

      4. Else E is Object?.

...

  • Map:

    1. Calculate the value's entry key type K and value type V, and key context C:

      1. If p has type arguments <K, V> for some K and V then use those, and C is K.

      2. Else if M implements Map<K, V> for some K and V then use those, and C is K.

      3. Else if M is dynamic then K and V are dynamic and C is _.

      4. Else K and V are Object? and C is _.

...

All of the boldfaced tests are type tests, checking whether a given type (or type schema) matches an expected interface type, and if it does, extracting the type argument(s). As currently implemented, these tests all have the following behavioral subtleties in common:

  1. If the type being matched is a nullable type or a star type, then the analyzer and front end consider it to match. (E.g., List<int>? is considered to match List<T>, where T = int.)
  2. If the type being matched is a FutureOr<T> type, and T is considered to match, then the analyzer and front end do not consider it to match. (E.g., FutureOr<List<int>> is not considered to match List<T> for any T.)
  3. If the type being matched is a type parameter type (or a type parameter type followed by * or ?), and the type parameter's bound would have matched, then the analyzer considers it to match; the front end does not. This is the case both for bounds introduced by an extends clause (at the site of the type parameter's declaration) and bounds introduced by type promotion. (E.g. the analyzer considers U&List<int> to match List<T>, where T = int; the front end does not).

This is not what I would have expected, from a naive reading of the spec. I would have expected the matching to behave like a subtype check, so:

  1. If the type being matched is a nullable type or a star type, then it doesn't match (because e.g., List<int>? is not a subtype of List<T> for any T).
  2. If the type being matched is a FutureOr<T> type, then it doesn't match (because e.g., FutureOr<List<int>> is not a subtype of List<T> for any T).
  3. If the type being matched is a type parameter type (not followed by * or ?), and the type parameter's bound would have matched, then it matches (because e.g., U&List<int> is a subtype of List<T>, where T = int).

I suspect none of these behavioral differences were intentional, and I suspect that none of them matter in practice. As far as I've been able to tell so far, none of them are covered by any test cases.

I don't think we have a clear definition anywhere of what we mean in the spec by phrases like "is a Foo<T> for some T" and "implements Foo<T> for some T", so I would love to hear thoughts from @dart-lang/language-team: do you agree with the behavior I would have expected? If so, I would like to move ahead with a bug fix (and breaking change announcement).

lrhn commented 6 months ago

There are two different cases here, one when we find the context type, and one when we type-check the pattern against a value.

In the context type, we get a type out of a rest pattern, and want to figure out which iterable that is. However, the type of the rest pattern is a pattern which can be matched against a list, which means it must accept a list, but it can safely accept more. Like var _ or FutureOr<List<int>> _.

This matters most in assignment patterns, where we use existing variables:

FutureOr<Iterable<int>?> fol;
if (x is List<int>) {
  int first;
  [first, ...fol] = x; 
} else // ...

Here the context type for the pattern fol is FutureOr<Iterable<int>?>.

In a sublist pattern, we know it will only be applied to a list, what we want to know is which type of list will work. We have a List<T>, for some unknown T, and we want to know which Ts make it acceptable as a FutureOr<Iterable<int>?>.

That's something we do with inference and context types all the time:

List<T> makeList<T>() => <T>[]; 
FutureOr<Iterable<int>?> x = makeList();`.

The context type for makeList is FutureOr<Iterable<int>?>, and we need to solve for List<T> <: FutureOr<Iterable|<int>?>, which, if successful, should be T = int. And if not, T will be _.

The spec here says:

type schema of s is an Iterable<T> for some type schema T,

which is, arguably, phrasing the wrong way around. We want to know is an Iterable<T> for some T is-a "type-scheme-of-s" (in the subtyping meaning of "is-a"). And if so, which T is "best"? (The maximal type accepted, as usual.) And we really want to solve for List<T>, because that's the kind of value the subpattern is matched against. Saying Iterable here is possibly an attempt to be more permissive, which is only needed because the subtying is in the wrong direction.

So here we should really say "if there is a T such that List<T> is a subtype of type schema of s, let T be the greatest such T." (Or just say "Solve List<T> \<: type-scheme-of-s for T", in whatever way we usually say that during type inference.)

In the type checking, we are going in the other direction, having a matched value of the matched value type, and wanting to see if a pattern matches it, and if it does, how. That's two things, but it surprisingly comes down to the same question as before: For which T is List<T> a subtype of M.

That's because we want to find the intersection of List<T> and M, the subset of M which can be matched by the list pattern, and which would therefore flow into the pattern. The parts of M which are not lists can be ignored. The parts of M that are, are also what what we should type-check the pattern against, because they'll actually have a chance to flow into it. Let's make M fun, like FutureOr<Iterable<int>?> again.

So, can List<T> match something of that type at all, or is the intersection empty. That's the case if List<T> \<: FutureOr<Iterable<int>?>. The quick check for that is to check whether List<Never> is a subtype, which it is. That means that at least one T satisfies the check. Then we want to know for which T a FutureOr<Iterable<int>?> can be a List<T>, that is which lists can match and then be processed by the sub-patterns. That's also the List<T>s for Ts where where List<T> \<: FutureOr<Iterable<int>?>.

(It annoys me that I said "intersection" above, and yet this is not symmetric. But that's probably because I'm looking at it from the List<T>'s perspective, and because the variable we need to solve for is in List<T>. If there was a variable in the other side, we'd probably need to reverse the <: to solve for that.)

In general, this is similar to issues we've solved for futures with "derives a future type", and similar to how we (should be) extracting an element type from the return type of a sync* or async* function,

We have a supertype of something that we want to assign an {Iterable, Stream, Future, List, other generic type} to, we just need to know which element type the supertype implies.

We solve that for context types all the time:

List<T> makeList<T>() => <T>[]; 
FutureOr<Iterable<int>?> x = makeList();

The context type for makeList() is FutureOr<Iterable<int>?>, and we need to solve for List<T> \<: FutureOr<Iterable|<int>?>, which is successful and gives us T = int as the maximal solution.

So, maybe, we should recognize this pattern and introduce a name for it, so we can use it in the places where it's precisely what we need, instead of reinventing it every time, sometimes less successfully (or always less successfully, going by the editing history of those parts of the spec).

We can steal the "derives a future type" pattern and say "Does S derive an I type" where S is a type(-scheme) and I is a generic type(-scheme) with n type arguments, and it means "Can we solve I<T1,...,Tn> \<: S, and if so, what is the I<T1,...,Tn> type?"

(It's not necessarily exactly what we need every time. For example we will still need a special "derives a future type", because we want to treat FutureOr<Future<Object>> as deriving Future<Future<Object>>, even if Future<Object> is also a subtype, and a larger one at that. Futures are special because FutureOr exists and we don't intend futures to nest, so there are some heuristics involved too.)

I think it would be what we want for finding the iterable element type of a sync* function, or stream element type of an async* function, from the declared return type. Those are currently defined individually, each inductively on the type structure, down to an actual Iterable or Stream. If we use the completely general "solve for T" approach, we don't need to update those algorithms every time we add a new kind of type, we'll just piggyback on what inference has to do everywhere else.

eernstg commented 6 months ago

We might have a general notion of "a type T derives a G type" where G is a reference to a generic class declaration (for example "a type T derives an Iterable type", which would be true for List<num>? and for the raw type Iterable).

I don't think we can make it technically the same thing as "derives a future type" because we don't have anything that corresponds to FutureOr for any other type than Future (like ListOr<int> which would mean List<int> | int). But that shouldn't be a problem, we simply have two concepts that are similar, but not entirely equivalent.

  • If T is a type introduced by a class, mixin, mixin class, or enum declaration that implements G<T1 .. Tk> then T derives the G type G<T1 .. Tk>.
  • If T derives the G type S then T? derives S.
  • If T derives the G type S then FutureOr<T> derives the G type S.
  • Let T be X & B. If B derives the G type S then T derives the G type S; otherwise, if X derives the G type S then T derives the G type S.
  • If X is a type variable with bound B and B derives the G type S then X derives the G type S.

If no case is applicable then T does not derive a G type.

We would probably have to find slightly different words in order to handle context type schemas, but the structure of these rules should remain the same.

This could be useful when the task is to "extract" a List or Map or similar type from a given context type schema in order to select the type arguments, or finding constraints on some type arguments that are being inferred.

stereotype441 commented 6 months ago

@lrhn, I like your insight of considering the "context type" and "type checking" cases separately.

I think your analysis of the context type case is spot on, and I agree that this suggests that what's in the spec is the wrong way round.

But in your reasoning about the "type checking" case, you say:

That's because we want to find the intersection of List<T> and M, the subset of M which can be matched by the list pattern, and which would therefore flow into the pattern. ... So, can List<T> match something of that type at all, or is the intersection empty. That's the case if List<T> <: FutureOr<Iterable<int>?>. The quick check for that is to check whether List<Never> is a subtype, which it is.

I don't believe it is correct in general that we can do a quick check of whether the intersection between List<T> and M is non-empty by checking whether List<Never> is a subtype of M. Consider:

class MyIntList implements List<int> {
  ...
}

If M is MyIntList, then List<Never> is not a subtype of M, but the intersection of List<T> and M is definitely non-empty, and the appropriate choice of T is int. A similar argument applies if M is a union type that includes MyIntList, such as MyIntList? or FutureOr<MyIntList>.

So I think in the "type checking" case, we probably ought to think about at least five possibilities:

  1. M is exactly List<T> for some T.
  2. M is exactly Iterable<T> for some T.
  3. M is an interface type that extends, implements, or mixes in List<T> or Iterable<T> for some T, e.g., MyIntList.
  4. M is a union, one of whose branches matches 1, 2, or 3, e.g. MyIntList? or FutureOr<Iterable<Object>>.
  5. M is an intersection type, one of whose branches matches 1, 2, 3, or 4, e.g. U&List<int> or U? where U extends FutureOr<Iterable<double>>.

All five of these possibilities seem like plausible cases in which we would want to infer an element type for the list pattern.

I can't think of anywhere else in the spec where we match types in a way that covers all five of these cases, so we might just have to bite the bullet and spell out the exact behavior we want in the pattern spec.

(Most of what I said above applies to map patterns as well, except obviously change List<T> to Map<K, V>, and drop all mentions of Iterable.)

lrhn commented 6 months ago

@stereotype441 Thanks! It irked me that things ended up the same, when they felt like they should be more different, but I couldn't spot the mistake.

The thing we want for type checking is the intersection type, or rather the maximally possible intersection type, if the other side contains type variables, any type of list that can possibly be checked against the list pattern. Do not as simple to find as I thought. Especially with a free type variable involved.

And then we want to find the list element type of that, if one exists.

Let me try to define the intersection computation, by structural induction in the other type, and see if it matches your rules.

In priority order:

  1. FutureOr<S> & List<T> = S & List<T>
  2. S? & List<T> = S & List<T>
  3. X extends S & List<T> = S & List<T>
  4. X & S & List<T> = S & List<T> (because S is an upper bound on the possible lists, and X's bound can't be a tighter upper bound, so we don't need to check it.)
  5. R implements List<S> & List<T> = List<T & S> (meaning R has List<S> as super-interface)
  6. R implements Iterable<S> & List<T> = List<T & S> (which subsumes the previous item)
  7. List<T> otherwise (unrelated type or supertype, but could have a subtype implementing any List type.)

The last case doesn't feel like an intersection rule, it shows that I only care about the list type.

Then we need to find the best choice for T that covers all of the intersection, which is either

Comparing to your rules:

So we do cover the same cases.

And it comes down to stripping away the non-Iterable parts of unions, the type variables from their bounds or promotions, and then seeing which Iterable type the rest implements, if any. So, agree.

(I prefer a fully recursive definition, rather than trying to enumerate possible cases from the top, that always end up missing something. For example FutureOr<(X extends List<int>)>?? may not be covered in your rules, but it's hard to say for sure.)

(That is actually not the true limit of which list types can occur, but it's good enough for what we can reasonably infer. The exception is something like FutureOr<List<int>>, where the value implements Future<List<int>> and List<String>. The inference is for your protection. The inferred type for the pattern will just not match that List<String>, and that's more than ok. We, our compilers, should just not assume that the inferred list pattern element type is unnecessary to check.)

munificent commented 6 months ago

I would have expected the matching to behave like a subtype check, so:

  1. If the type being matched is a nullable type or a star type, then it doesn't match (because e.g., List<int>? is not a subtype of List<T> for any T).
  2. If the type being matched is a FutureOr<T> type, then it doesn't match (because e.g., FutureOr<List<int>> is not a subtype of List<T> for any T).
  3. If the type being matched is a type parameter type (not followed by * or ?), and the type parameter's bound would have matched, then it matches (because e.g., U&List<int> is a subtype of List<T>, where T = int).

That was what I had in mind when I wrote the spec. (Though I agree that being able to unwrap a nullable type and extract list/map type arguments from the underlying type is useful, I hadn't thought about that or intended to specify it.)

I suspect none of these behavioral differences were intentional, and I suspect that none of them matter in practice. As far as I've been able to tell so far, none of them are covered by any test cases.

Yup and yup.

@eernstg, I really like the "derives a type" logic you propose. That sounds good to me.

lrhn commented 6 months ago

How to match also depends on whether it's a refutable or irrefutable pattern.

For an irrefutable pattern, the matched value type must be a subtype of the pattern's required type, so we must infer the type argument to the list pattern as a supertype of all possible list types that may occur. (If we can't, the pattern assignment is unsound and must be an error.)

For a refutable pattern, the matched value type can be a supertype, subtype, unrelated or potentially intersecting type. Here we probably went to infer the type argument to the list pattern to match the matched value type as best possible, trying to infer what the author expects.

That's where we want to see if the matched value type implies a list element type, or Iterable element type which implies the list element type of subtypes. If it does, that's probably the kind of list the user expects to catch, and we can use that as the matched value type for sub-patterns. (But only if we actually check the type. Otherwise we'll want FutureOr<List<int>> to imply an element type of Object?/_, which is what we do today:

import "dart:async";

class FL<T> implements Future<Never>, List<T> {
  final List<T> _list;
  FL(this._list);
  dynamic noSuchMethod(Invocation i) => super.noSuchMethod(i);
  int get length => _list.length;
  T operator [](int i) => _list[i];
  List<T> sublist(int start, [int? end]) => _list.sublist(start, end);
}

class FM<K, V> implements Future<Never>, Map<K, V> {
  final Map<K, V> _map;
  FM(this._map);
  dynamic noSuchMethod(Invocation i) => super.noSuchMethod(i);
  bool containsKey(Object? key) => _map.containsKey(key);
  V? operator [](Object? key) => _map[key];
}

void main() {
  FutureOr<List<int>> fl = FL<String>(<String>["a", "b"]);
  switch (fl) {
    case [var x, ...var y]:
      print("${[x].runtimeType} / ${x.runtimeType} : $x"); // x : dynamic
      print("${[y].runtimeType} / ${y.runtimeType} : $y"); // y : List<Object?>
  }
  FutureOr<Map<String, String>> fm = FM<String, String>({"ak": "av", "bk": "bv"});
  switch (fm) {
    case {"ak": var x}:
      print("${[x].runtimeType} / ${x.runtimeType} : $x"); // x : dynamic
  }
}

)

stereotype441 commented 6 months ago

@lrhn

Let me try to define the intersection computation, by structural induction in the other type, and see if it matches your rules.

...

Yup, I like your definition of intersection computation. It makes sense.

Comparing to your rules:

...

(I prefer a fully recursive definition, rather than trying to enumerate possible cases from the top, that always end up missing something. For example FutureOr<(X extends List<int>)>?? may not be covered in your rules, but it's hard to say for sure.)

Yeah, I prefer a fully recursive definition too. In fact, I didn't think of my 5 bullet points as "rules" so much as "examples we should make sure to think about when we're deciding what the rules should be" 😃.

The next question in my mind now is, should we write a definition of intersection straight into the spec, along the lines of what you outlined above? Or should we use your formulation of intersection to derive a recursive definition of the notion that we need for patterns? I'm thinking something along the lines of:

Define matchClass(T, C), where T is a type and C is a class, as follows:

And then in the pattern spec, instead of saying things like "if M implements List<T> for some T then E is T", we say "if matchClass(M, Iterable) = Iterable<T> for some T then E is T".

I think I personally prefer this formulation over putting a definition of intersection into the spec, because it's easier for me to concretely think about each case and reassure myself that it really does what I want. I also suspect it will be (slightly) easier to implement.

What do you think?

stereotype441 commented 6 months ago

Also, responding to @lrhn's later comment:

How to match also depends on whether it's a refutable or irrefutable pattern.

For an irrefutable pattern, the matched value type must be a subtype of the pattern's required type, so we must infer the type argument to the list pattern as a supertype of all possible list types that may occur. (If we can't, the pattern assignment is unsound and must be an error.)

For a refutable pattern, the matched value type can be a supertype, subtype, unrelated or potentially intersecting type. Here we probably went to infer the type argument to the list pattern to match the matched value type as best possible, trying to infer what the author expects.

I'm not sure I agree that we should be treating refutable and irrefutable patterns differently. I think that for both irrefutable and refutable patterns, we should try to do a good job of inferring what the author probably expects based on the static type of the matched value. If we come up with a reasonable set of inference rules that cover useful real-world use cases, and there's some contrived corner case where that reasonable set of inference rules leads to a compile-time error for an irrefutable pattern, or makes a subtle change to the behavior of a refutable pattern compared to what's implemented today, I think that's ok. Your examples of classes that implement both Future<Never> and List<T>, or Future<Never> and Map<K, V>, are the kind of thing that, IMHO, would never arise in real-world use cases. If the issue we were talking about were a soundness issue, I would definitely want to consider them and make sure they don't break soundness. But we're just talking about guessing what type the user means, and I don't think it really matters too much what we guess in cases like these.

Applying my recursive definition from my last comment to your example, I get:

void main() {
  FutureOr<List<int>> fl = FL<String>(<String>["a", "b"]);
  switch (fl) {
    case [var x, ...var y]: // interpreted as `case <int>[var x, ...var y]`
      // Therefore this case does not match
      ...
  }
  ...
}

And I think that's fine.

lrhn commented 6 months ago

The biggest difference between a refutable pattern and an irrefutable pattern is that in the latter, the matched value type must be assignable to the pattern's required type. It's a compile-time error if not. Which means that the intersection of the two types is the matched value type (if not dynamic).

In a refutable pattern, that requirement is not there, and the intersection gets more complicated.

We should still do the best reasonable inference of a case [...]: when the matched value type can imply a list type. What we do today is probably reasonable, infer cast <int>[...]: when matched against FutureOr<List<int>>. Just need to be sure we actually check that <int> type.