dart-lang / language

Design of the Dart language
Other
2.64k stars 198 forks source link

[Patterns] pattern context type schema for object pattern is underspecified #2623

Open stereotype441 opened 1 year ago

stereotype441 commented 1 year ago

The patterns proposal currently says this, in the "Pattern context type schema" section:

  • Object: The type the object name resolves to. This lets inference fill in type arguments in the value based on the object's type arguments, as in:

    var Foo<num>() = Foo();
    //                  ^-- Infer Foo<num>.

If the type being referred to is generic, but the pattern doesn't specify type arguments, what should the context be?

I can think of two possibilities that might make sense:

  1. fill in all the type arguments with ?
  2. use "instantiate to bounds"

Here's an example where it makes a difference:

class C<T> {
  final List<T> values;
  C(this.values);
}

main() {
  var C(:values) = C([1, 2, 3]);
  print(values.runtimeType);
}

If we do option 1 (fill in all the type arguments with ?), the context C<?> will be used to infer C([1, 2, 3]); consequently the context List<?> will be used to infer [1, 2, 3], so the list that's created will be a List<int>.

If we do option 2 (instantiate to bounds), the context C<dynamic> will be used to infer C([1, 2, 3]); consequently the context List<dynamic> will be used to infer [1, 2, 3], so the list that's created will be a List<dynamic>.

My gut says we should go with option 1 but I'm not certain.

stereotype441 commented 1 year ago

CC @munificent @leafpetersen @jakemac53 @lrhn @mit-mit @natebosch @eernstg @kallentu

munificent commented 1 year ago

I believe regular variable declarations use option 1 (?):

main() {
  List list = [1, 2, 3];
  print(list.runtimeType);
}

This prints List<dynamic>. I think we probably want to do the same thing for patterns, but I'd like to hear from folks who know the type system better than I do.

stereotype441 commented 1 year ago

I believe regular variable declarations use option 1 (?):

main() {
  List list = [1, 2, 3];
  print(list.runtimeType);
}

This prints List<dynamic>. I think we probably want to do the same thing for patterns, but I'd like to hear from folks who know the type system better than I do.

If I'm not mistaken, regular variable declarations use option 2 (instantiate to bounds), which is why this prints List<dynamic>. It instantiates List<E> using the bound of E, which is dynamic; then it uses List<dynamic> as the context for analyzing [1, 2, 3]. At downward inference time, we generate the constraint T <: dynamic, which causes dynamic to be chosen for the list element type before the list elements are even visited.

If the context had been List<?>, then downward inference would not have generated any constraint (we don't generate constraints of the form T <: ?), and so the type would remain unknown until upwards inference time. At that point the presence of the 1, 2, and 3 would cause the constraints int <: T to be generated, at which point int would be chosen for the list type.

You can simulate the behaviour for List<?> using this test program:

dynamic f<T>(List<T> x) => x;
main() {
  print(f([1, 2, 3]).runtimeType);
}

Which prints List<int>.

lrhn commented 1 year ago

Take:

class Box<T> {
  T value;
  Box(this.value);
}

and

var Box(value: int x) = Box(1);

The type schema for the LHS will be either Box<?> (not inferred yet), Box<dynamic> (instantiated to bound) or Box<int> (inferred from the type schema for value, which constrains T).

Do we get the third option? If not, can we?

If we don't infer Box<int>, I'd prefer to use Box<?> as the context type schema of the RHS. If we use Box<dynamic> as context type, we will make the RHS be Box<dynamic>(1). If we use Box<?>, it becomes Box<int>(1), which is actually matched by the LHS.

Then we come back to the LHS for type checking. If we get Box<int> as matched value type, it will work, otherwise it won't.

So, using dynamic makes that line fail to compile, even if it looks very reasonable. Using ? makes it work. Inferring int works too.

Using int will be much better. Take:

var Box(: double value) = Box(0); 

If we get Box<double> as RHS context type schema, then this works.

So, can we define the schema extraction for an Object patterns as:

Very handwavy. Does it seem possible?

stereotype441 commented 1 year ago

Thanks @lrhn for your very astute and precise inside-the-Box thinking. Those examples are really compelling.

So, can we define the schema extraction for an Object patterns as:

  • Let C be the declaration that the object name resolves to.
  • If C is generic, and the object pattern does not provide all type arguments (as actual type arguments, or by unwinding type aliases),

    • Find the type schemas for each field pattern of the object pattern.

    • Match them up against the declared types of the getters invoked by the field patterns.

    • Feel free to ignore those whose type does not involve a type variable.

    • Try to "solve for the type variables."

    • If successful, use the inferred type as type argument of C for that type variable.

    • If not, use ?.

  • Then the type schema for the object pattern is the thus instantiated C.

Very handwavy. Does it seem possible?

Yeah, I think it's possible. In fact, I think it's not too hard to get rid of the handwaving. The core operation we use to create type inference constraints is "try to match U as a subtype of V", where either U or V can refer to the type variables being solved for (but not both). (The actual spec is a little wordier than this--see Subtype constraint generation for details.) So the logic would be:

Note that the constraint solution for C doesn't pin down all the type variables; just the ones for which constraints were generated. Which is the behaviour we want here. It also does all the right things to account for bounds.

The logic in the analyzer and CFE for doing all this is rather crufty, so it won't necessarily be easy. (But I've made several attempts to tidy it up over the years, and I'm not afraid to try again!)

I do wonder a little bit if the result would be useful enough to justify the implementation effort, but on the other hand it's so much cheaper to get this sort of thing right the first time rather than fix it later (because you don't have to fix code that's out there in the wild relying on the old inference behavior).

munificent commented 1 year ago

If I'm not mistaken, regular variable declarations use option 2 (instantiate to bounds), which is why this prints List<dynamic>.

Ha, I am clearly out of my element here.

The type schema for the LHS will be either Box<?> (not inferred yet), Box<dynamic> (instantiated to bound) or Box<int> (inferred from the type schema for value, which constrains T).

Do we get the third option? If not, can we?

This is an excellent point. I think we should get the third option if possible. In particular, because that is what we do for list and map patterns that don't have explicit type arguments: we infer from them the subpatterns. So there's already an ad hoc bottom-up type schema inference going on for two of the destructuring patterns, and it seems like we should really generalize that to object patterns too. It would be unfortunate if these caused different types of lists to be instantiated:

var [int x, ...] = [1];
var List(first: int x) = [1];

It looks to me essentially like the bottom-up inference we do for generic method invocations based on the argument expressions.

stereotype441 commented 1 year ago

This is an excellent point. I think we should get the third option if possible. In particular, because that is what we do for list and map patterns that don't have explicit type arguments: we infer from them the subpatterns. So there's already an ad hoc bottom-up type schema inference going on for two of the destructuring patterns, and it seems like we should really generalize that to object patterns too. It would be unfortunate if these caused different types of lists to be instantiated:

var [int x, ...] = [1];
var List(first: int x) = [1];

Oh yes, good point, I would really like for type inference to treat those two in the same way as well.

It sounds like we have a fairly good consensus, but I'm interested in hearing from @leafpetersen about whether he thinks this amazing awesome functionality is going to be worth the extra implementation effort. I think it's worth it, but I know that I'm prone to biting off more than I can chew, and Leaf is the person who usually talks sense into me when I'm about to take on too much work. Assuming he doesn't try to talk me out of it, I'll plan to write up a PR adding this to the patterns spec, and add some language tests.

leafpetersen commented 1 year ago

Sorry, I dropped the ball here. I'm fine with this, I think it makes sense. If @stereotype441 is still interested in pursuing it, I'm happy to review a PR.

eernstg commented 1 year ago

I think it would be pragmatically very desirable to compute a context type schema for a pattern which is as informative as it can be, based on subpatterns. That is, it would be very nice if we could do as @stereotype441 specifies here.

However, there's a discrepancy with the treatment of "half" types in regular (non-pattern) declarations, as mentioned by @munificent here: In that case we will use instantiation to bound to obtain the missing type arguments.

A third situation where we have considered using inference rather than instantiation to bound is type tests: x is List.

I think it would be great if we could make these situations converge as much as possible. Here are a few potential scenarios:

  1. Use @stereotype441's inference for pattern declarations, and use a similar approach for regular declarations and type tests (this is a breaking change, e.g., List xs = [1]; will turn List into List<int> rather than List<dynamic>).
  2. Use @stereotype441's inference for pattern declarations, and aggressively get rid of "half" types in regular declarations and type tests. (So we'd warn/lint against List xs = [1]; as well as x is List).
  3. Use instantiation to bound to fill in missing type variables when computing the context type schema of a declaration pattern (that is, use instantiation to bound "everywhere").
  4. Accept the discrepancy: Pattern declarations will use @stereotype441's inference, and regular declarations and type tests will use instantiation to bound.

I certainly think that (1) and (2) are much better than (3). Developers have been asking for something like (1) with regular declarations, and they've been surprised that we didn't already do that. But (4) is actually even worse than (3), because developers won't get a suitable hint about the discrepancy, and it really does matter whether your list is List<int> or List<dynamic>.

It also seems useful to be able to say, e.g., that the desired declared type is "some kind of Iterable" using Iterable xs = [1];: This would make the declared type Iterable<int>, which may be more appropriate than List<int> in the given situation, and it can't be expressed unless the type Iterable<int> is specified explicitly and in full (that is, including <int>).