Type inference does not solve some constraints involving F-bounds

dart-lang / language

Design of the Dart language

Other

2.61k stars 199 forks source link

Type inference does not solve some constraints involving F-bounds #3009

Open eernstg opened 1 year ago

eernstg commented 1 year ago

Consider the following program:

class A<X extends A<X>> {}
class B extends A<B> {}
class C extends B {}

void f<X extends A<X>>(X x) {}

void main() {
  f(B()); // OK.
  f(C()); // Inference fails, compile-time error.
  f<B>(C()); // OK.
}

Most likely, the inference could succeed in choosing the type argument B even in the case where the actual argument has type C if the bounds (in this case: an F-bound X extends A<X>) are taken into account during constraint solution. In any case, the current behavior is that inference fails as mentioned, and this is a problem that does occur in practice (e.g., https://github.com/dart-lang/sdk/issues/35799).

eernstg commented 3 months ago

[Edit: Eliminated a couple of mistakes.]

Here's an outline of an approach that we could explore:

In section Constraint solution for a set of type variables we have the following step:

Otherwise, let Bi be the bound of Xi. Then, let Bi' be the type schema formed by substituting type schemas {U0, ..., Ui-1, Ti, ..., Tn} in place of the type variables {X0, ..., Xn} in Bi. (That is, we substitute Uj for Xj when j < i and Tj for Xj when j >= i). Then Ui is the constraint solution for the type variable Xi with respect to the constraint set C + (X <: Bi').

We could enhance this step such that it works differently when Bi is an F-bound, that is, it is a parameterized type where Xi occurs. Assume that the bound is Xi extends A<S1 ... Sk>.

In that case we would not find the constraint solution as described. We would instead consider C alone, take the least closure of its lower bound, L, and search the superinterface graph of L to find a parameterized type of the form A<V1 .. Vk>.

For each occurrence of Xi in S1 ... Sk, find the subterm W of V1 ... Vk that corresponds to said occurrence in S1 ... Sk. Then resolve the constraint A<S1 .. Sk> <# A<V1 .. Vk> with respect to {X0 .. Xn}, yielding the candidate value of Xi; call it W.

Check that L <: W, that W <: A<V1 .. Vk>, and that W <: U where U is the greatest closure of the upper bound in C. This procedure has failed if that is not the case.

Otherwise, Ui is W. That is, we choose W as the selected value of Xi.

If any of the steps up to this point fails then we fall back and use the approach which is used today.

The point is that the binding of Xi must be a supertype of L, and it must be a subtype of A<V1 ... Vk>, and I suspect that W is the only candidate to satisfy that constraint. (Note that L does not have multiple distinct superinterfaces of the form A<...> because that's a compile-time error.)

Similarly, in section Grounded constraint solution for a set of type variables we have the following, and it should be enhanced to handle F-bounds similarly:

Otherwise, let Bi be the bound of Xi. Then, let Bi' be the type schema formed by substituting type schemas {U0, ..., Ui-1, Ti, ..., Tn} in place of the type variables {X0, ..., Xn} in Bi. (That is, we substitute Uj for Xj when j < i and Tj for Xj when j >= i). Then Ui is the grounded constraint solution for the type variable Xi with respect to the constraint set C + (X <: Bi').

eernstg commented 3 months ago

@stereotype441, here's the issue I mentioned at breakfast. ;-)

eernstg commented 3 months ago

An extra note on known solutions: Assume that A is a generic class and X is a type variable whose bound is T where X occurs in T.

In this situation, X == Never will always satisfy the bound. In the case where we're inferring a value for X and there is no (non-bottom) lower bound, or the lower bound is _, binding X to Never is a possible solution. It may or may not be typical for that solution to be desirable. I've assumed that it is generally not very useful (and hence it wasn't mentioned above in the proposal that I outlined).

In the case where every occurrence of X in T is covariant, every recursive unfolding of a type that satisfies the bound will again satisfy the bound. Consider X extends A<X>. In this case, X == Never will satisfy the bound, and so will X == A<Never>, X == A<A<Never>> and so on. This might be useful, but it is not obvious how we should go about selecting one solution from that infinite set.

If every occurrence of X in T is contravariant and X == S satisfies the bound then every subtype of S will satisfy the bound as well. In short, from S1 <: S follows [S/X]T <: [S1/X]T, and hence S1 <: S <: [S/X]T <: [S1/X]T. Again, I assume that this will not come up very often in practice.

Based on these considerations, I tend to conclude that the proposed algorithm here can ignore these solutions, using only the nominally declared solutions (like using class B extends A<B> {} to solve X <: A<X>).

eernstg commented 1 week ago

We're currently exploring a solution to this issue, based on a generalization of the type inference algorithm that introduces a few extra subtype constraints on the type variables which are being computed, based on a more intensive use of the type parameter bounds. As a consequence, we will be able to infer the following:

class A<X extends Iterable<Y>, Y> {
  A(X x) {
    print('A<$X, $Y>');
  }
}

void main() {
  A([1]); // 'A<List<int>, dynamic>' today, becomes 'A<List<int>, int>'.
}

The reason for this change is that the new type inference algorithm generates a constraint that Y must be a supertype of int, based on the result that X has been chosen to be List<int> (based on the actual argument). The general approach during type inference is to use lower bounds if they are types (not type schemas, that is "it's a fully known type"), and only using upper bounds if the lower bound can't be used. So we prefer int and forget about the fact that dynamic is also a solution. Woohoo! :tada:

nshahan commented 1 week ago

I want to bring this package to your attention: https://pub.dev/documentation/dart_internal/latest/extract_type_arguments/extract_type_arguments-library.html It requires an out of spec feature from the compilers to support it but it was deemed too breaking to not support.

There are two pub packages that depend on it:

At first glance, it sounds like this change could allow the owners of these packages to rewrite their APIs to remove their dependency on package:dart_internal and unblock backends from eventually dropping support for it. Is that use case on your radar?

FMorschel commented 6 days ago

and only using upper bounds if the lower bound can't be used

Does this mean that https://github.com/dart-lang/language/issues/1674 would be related to this as well?

I'm not sure it would do everything that issue is asking, but it seems pretty close to me.

eernstg commented 6 days ago

Does this mean that https://github.com/dart-lang/language/issues/1674 would be related to this as well?

The addition of support for declared lower bounds on type parameters (that is, #1674) isn't quite the same as this feature. A typical example of #1674 would be that we wish to specify a type at a call site which is a locally known supertype of a type parameter of the receiver:

import 'dart:math';

abstract class A<X> {
  Y or<Y super X>(Y y);
}

class B1<X> implements A<X> {
  final X x;
  B1(this.x);
  Y or<Y super X>(Y y) => x; // <--- This body needs the `super` bound.
}

class B2<X> extends A<X> {
  Y or<Y super X>(Y y) => y;
}

void main() {
  A<num> a = Random().nextBool() ? B1(20) : B2(); 
  num n = a.or(2.5);
}

In order to make or type safe (in B1) we need to require that Y is a supertype of X. We can't achieve this via improved type inference because we don't get to choose the value of X and Y in a single step at compile time, we already have an instance of A whose type argument (the value of X) has been chosen and may not be statically known (in the example it is actually int but statically known as num).

At the call site for or we know that the actual value of X is some unknown type T such that T <: num. But we may then infer the actual type argument for or and get a.or<num>(2.5). This is optimal because we know for sure that num :> T :> TheValueOfXOf_a, and this is needed in order to maintain type safety when B1.or returns x, whose type is known to be a subtype of TheValueOfXOf_a.

eernstg commented 6 days ago

@nshahan wrote:

... the owners of these packages [could] rewrite their APIs to remove their dependency on package:dart_internal ...

Those two features are not quite the same: The improved type inference which is discussed in this issue is a static analysis, and the 'extract_type_arguments' library provides a feature that provides access to the run-time values of type parameters. In particular, we can't do the latter with the former.

On the other hand, there's no need to use external functions (as 'extract_type_arguments' does) when considering the task of extracting the values of type parameters in isolation, it is only done in that way because it would be too much of a breaking change to add a method to List and Map. If we're free to add a method to the given class then it's expressible in Dart without any external magic:

class C<X> {
  X x;
  C(this.x);
  R callWithX<R>(R Function<Y>(C<Y> self) callback) => callback<X>(this);
}

void main() {
  C<Object> c = C<String>('Hello'); // "Forget" the actual type argument.
  c.callWithX(<X>(C<X> _) => print(X)); // 'String'.
}

You may or may not prefer to pass the receiver to the callback, it's just done here in order to ensure that the body of the callback has access to the receiver with the best possible typing. If you don't do that then the body can always do something like final self = c as C<X>; and proceed in the same way. This cast is guaranteed to succeed because of the implementation of callWithX (but the type system can't see it, so we do need to pass the receiver as an argument, or cast the receiver, if we need to access the receiver with the improved typing which is the point of having callWithX in the first place).

nshahan commented 6 days ago

Those two features are not quite the same: The improved type inference which is discussed in this issue is a static analysis, and the 'extract_type_arguments' library provides a feature that provides access to the run-time values of type parameters. In particular, we can't do the latter with the former.

Ah yes thinking about it more now that makes perfect sense, thanks for pointing it out.

FMorschel commented 4 days ago

One more thing that I'd like to ask in relation to your comment.

The folowing works just fine (see that all are Tween<double> because of the outer generics):

TweenSequence<double>([
  TweenSequenceItem(tween: Tween(begin: 0, end: -90), weight: 1),
  TweenSequenceItem(tween: Tween(begin: -90, end: 0), weight: 1),
])

But when I add .chain in the first Tween it resolves to Tween<int> instead of Tween<double> it was solving previously.:

TweenSequence<double>([
  TweenSequenceItem(
    tween: Tween(begin: 0, end: -90).chain(CurveTween(curve: Curves.easeInQuad)), // <--------- here
    weight: 1
  ),
  TweenSequenceItem(tween: Tween(begin: -90, end: 0), weight: 1),
])

The only thing added was: .chain(CurveTween(curve: Curves.easeInQuad)).

I then get: The argument type 'Animatable<int>' can't be assigned to the parameter type 'Animatable<double>'.

In the whole expression: Tween(begin: 0, end: -90).chain(CurveTween(curve: Curves.easeInQuad))

Is this intended behaviour? Would your new approach solve this case? Or even, should I file an issue with this?

eernstg commented 4 days ago

Is this intended behaviour? Would your new approach solve this case?

This is indeed behaving as it is expected to behave. It is a part of type inference which is being explored actively, see https://github.com/dart-lang/language/issues/3527, and we might have a substantial improvement in the future. It will take some time, though, because it is rather complex.

The basic issue here is that the receiver of a member access expression (e.g., a method call) will have the empty context type during type inference, and this means that the receiver expression will obtain the type which is based on its own subexpressions. For example:

extension<X> on X {
  X get id => this;
}

void main() {
  // For an integer literal, the context type can make it an `int` or a `double`.
  double d = 1; // `1` evaluates to the `double` value `1.0` at run time.
  var v = 1; // `1` evaluates to the `int` value `1` at run time.
  d = v; // Compile-time error.

  // As a receiver, `1` always evaluates to an `int`.
  d = 1.id; // Error, `1.id` has the type `int`.
}

One way to say this is that we can't transfer the context type from a member access to the receiver: If we're considering r.toString() with context type String then we don't know anything about the appropriate type for r (any type whatsoever will do), so we have to perform type inference on r using the empty context type _. This is what Dart type inference does today.

Similarly, when we consider Tween(begin: 0, end: -90).chain(CurveTween(curve: Curves.easeInQuad)) with context type Tween<double> (if I got that correct, but it shouldn't actually make a difference) then we could in principle search the entire world to find all the types that have a chain method, and check whether that type could be the type of the receiver.

In this particular case the receiver is Tween(begin: 0, end: -90) which is probably a constructor invocation where a type argument X is inferred to a value T, and, presumably begin and end have declared type X. So if we can make it Tween<double>(begin: 0, end: -90) then 0 and 90 will have context type double and they will be double values, and the resulting Tween instance will have the type Tween<double>.

However, what we're basically considering here is the set of possible output types from type inference (like: "tell me every possible type of the result of performing type inference on Tween(begin: 0, end: -90)). It is not a given that the receiver is a constructor invocation, so this set of types could basically be anything:

X whatever<X>() => throw "Haha!";

void main() {
  SomeContextType x = whatever().chain(CurveTween(curve: Curves.easeInQuad));
}

In this case the type of the receiver could actually be every single denotable type in the universe (or at least the ones that we've imported ;-).

In summary, it isn't a simple matter to "go backwards" through a member invocation and find the corresponding receiver type, because there's no limit on the number of completely unrelated types out there which do have a member with that particular name, and some of them might have a return type (after yet another round of type inference) which is assignable to the original context type.

In any case, @chloestefantsova is able to do something about this, which is obviously pure magic. :smile: But it will take a while before we can use it.