Provide CFE functionality whether is/as checks can omit checking the type arguments

mkustermann commented 9 months ago

For backends it's usually much simpler & faster to check whether an object is an instance of a class than to check whether it's an instance of an interface type with arguments.

Concretely, we have <obj> is/as <type> expressions where <type> is an InterfaceType but with non-defaults-to-bounds type arguments. Doing those checks is expensive.

Now there's a subset of cases where we actually have a static guarantee that the type arguments will match and therefore don't have to be checked, reducing the check to a simple obj instanceof Class check.

Here's a few examples where type argument checks are not needed:

class B0<T> {}
class B1<T, H> extends B0<int> {}
class B2<T> extends B1<T, T> {}

main() {
  final B1<num, num> a
  a is B2<num>

  final B1<int, int> a
  a is B2<num>
  a is B2<int>

  final B1<List<int>, List<int>> a
  a is B2<List<num>>
  a is B2<List<int>>
}

class X<T extends num> {
  void test1() {
    final Iterable<T> a;
    a is List<T>;

    final B1<T, T> a
    a is B2<T>

    final B1<List<T>, List<T>> a
    a is B2<List<T>>
    a is B2<List<num>>
  }
}

The algorithm to determine whether type arguments don't have to be checked for a <obj> is X<A0, ... An> with <obj> having static type Y<B0, ..., Bm> could be something like this:

Take the raw/this-type of X, i.e. X<X::T, X::T2, X::T3>
Find the path from X to Y
Apply the super types from this path using the raw type
Match the resulting Y<Expr-1(X::T1..Tn), ..., Expr-m(X::T1..Tn)> against the static type of operand (Y<B0, ..., Bm>)
- finding values for all type parameters X::T1 ... X::Tn
- if we find consistent assignments Operand(X::Ti) for all X::Ti
- and X<Operand(X::Ti), ..., Operand(X::Tn)> is a subtype of X<A0, ..., An>
- then we can skip checking the type arguments and only check whether obj is an X.

So the concrete request I have is

Could CFE team make such an algorithm available in package:kernel/*.dart
Could the CFE annotate AsExpression and IsExpression with a bit indicating whether the type arguments don't need to be checked

/cc @johnniwinther @chloestefantsova?

chloestefantsova commented 9 months ago

That's a very neat optimization that I believe is quite possible, so my answer to both questions would be "yes". I also think it shouldn't take much time, and I'd be happy to implement it.

A related question is how useful such a bit on AsExpression and IsExpression would be to other backends.

@johnniwinther WDYT?

johnniwinther commented 9 months ago

That seems like a valuable flag to have on AsExpression and IsExpression.

@mkustermann Would it be useful for your use-case to also mark checks against record types as "only structural" (for instance is (Object?, Object?)?

mkustermann commented 9 months ago

I also think it shouldn't take much time, and I'd be happy to implement it.

See simple version in cl/353900 (as part of a bugfix in dart2wasm compiler)

@mkustermann Would it be useful for your use-case to also mark checks against record types as "only structural" (for instance is (Object?, Object?)?

That may be helpful yes, as we can have fast ways to check whether an object has a certain shape.

A related question is how useful such a bit on AsExpression and IsExpression would be to other backends.

Definitely useful for wasm, vm. Very likely also for dart2js /cc @rakudrama

chloestefantsova commented 9 months ago

@mkustermann I'm looking into this optimization now, and I have multiple design alternatives for the flags on IsExpression and AsExpression. Basically, we have two boolean parameters that we can variate the number and the meaning of the flags with: (1) whether we want type-kind specific flags or not, and (2) whether we want the distinction between operand-dependent and operand-independent shape checks or not.

The type-kind specific flags are more or less self explanatory: they indicate what kind of shape that should be checked. We have the following possibilities for non-simple and not-erased types in Kernel: interface types, record types, function types, and FutureOr types.

The flags that make distinction between the operand-dependent and operand-independent checks signal whether the optimizing flag computation took the static type of the operand expression into account. For example, in the check x is List<num> we may conclude that checking for the runtime type of the value stored in x being any List is sufficient because we know that the static type of the expression x is Iterable<int>. In that case we say that the shape-only check is sufficient, and it is operand-dependent. In contrast with that, in the check x is List<dynamic> we conclude that checking for the runtime type of the value stored in x for being any List is sufficient because List<dynamic> is a supertype of any List. In that case we say that the shape-only check is sufficient, and it is operand-independent.

Please let me know which of the alternatives would work the best for you.

Variant 0: We do want the type-specific flags, and we do want to distinguish between the operand-dependent and operand-independent shape checks

We add the following flags to IsExpression and AsExpression AST nodes in Kernel.

isOperandIndependentInterfaceTypeShapeCheckSufficient
isOperandIndependentRecordTypeShapeCheckSufficient
isOperandIndependentFunctionTypeShapeCheckSufficient
isOperandIndependentFutureOrTypeShapeCheckSufficient
isOperandDependentInterfaceTypeShapeCheckSufficient
isOperandDependentRecordTypeShapeCheckSufficient
isOperandDependentFunctionTypeShapeCheckSufficient
isOperandDependentFutureOrTypeShapeCheckSufficient

The following is an example of those flags being set.

class A<X> {}
class B<Y extends num> extends A<Y> {}

test(A<int> a, (int,) r, Function(num) f, FutureOr<int> fo) {
  a as B<int>; // isOperandDependentInterfaceTypeShapeCheckSufficient = true
  a as B<num>; // isOperandIndependentInterfaceTypeShapeCheckSufficient = true

  r as (num,); // isOperandDependentRecordTypeShapeCheckSufficient = true
  r as (Object?,); // isOperandIndependentRecordTypeShapeCheckSufficient = true

  f as Function(int); // isOperandDependentFunctionTypeShapeCheckSufficient = true
  f as Function(Never); // isOperandIndependentFunctionTypeShapeCheckSufficient = true

  fo as FutureOr<num>; // isOperandDependentFutureOrTypeShapeCheckSufficient = true
  fo as FutureOr<dynamic>; // isOperandIndependentFutureOrTypeShapeCheckSufficient = true
}

Variant 1: We do want the type-specific flags, but we don’t want to distinguish between the operand-dependent and operand-independent shape checks

We add the following flags to IsExpression and AsExpression AST nodes in Kernel.

isInterfaceTypeShapeCheckSufficient
isRecordTypeShapeCheckSufficient
isFunctionTypeShapeCheckSufficient
isFutureOrTypeShapeCheckSufficient

The following is an example of those flags being set.

class A<X> {}
class B<Y extends num> extends A<Y> {}

test(A<int> a, (int,) r, Function(num) f, FutureOr<int> fo) {
  a as B<int>; // isInterfaceTypeShapeCheckSufficient = true
  a as B<num>; // isInterfaceTypeShapeCheckSufficient = true

  r as (num,); // isRecordTypeShapeCheckSufficient = true
  r as (Object?,); // isRecordTypeShapeCheckSufficient = true

  f as Function(int); // isFunctionTypeShapeCheckSufficient = true
  f as Function(Never); // isFunctionTypeShapeCheckSufficient = true

  fo as FutureOr<num>; // isFutureOrTypeShapeCheckSufficient = true
  fo as FutureOr<dynamic>; // isFutureOrTypeShapeCheckSufficient = true
}

Variant 2: We don’t want the type-specific flags, but we do want to distinguish between the operand-dependent and operand-independent shape checks

We add the following flags to IsExpression and AsExpression AST nodes in Kernel.

isOperandIndependentShapeCheckSufficient
isOperandDependentShapeCheckSufficient

The following is an example of those flags being set.

class A<X> {}
class B<Y extends num> extends A<Y> {}

test(A<int> a, (int,) r, Function(num) f, FutureOr<int> fo) {
  a as B<int>; // isOperandDependentShapeCheckSufficient = true
  a as B<num>; // isOperandIndependentShapeCheckSufficient = true

  r as (num,); // isOperandDependentShapeCheckSufficient = true
  r as (Object?,); // isOperandIndependentShapeCheckSufficient = true

  f as Function(int); // isOperandDependentShapeCheckSufficient = true
  f as Function(Never); // isOperandIndependentShapeCheckSufficient = true

  fo as FutureOr<num>; // isOperandDependentShapeCheckSufficient = true
  fo as FutureOr<dynamic>; // isOperandIndependentShapeCheckSufficient = true
}

Variant 3: We don’t want the type-specific flags, and we don’t want to distinguish between the operand-dependent and operand-independent shape checks

We add the following flags to IsExpression and AsExpression AST nodes in Kernel.

isShapeCheckSufficient

class A<X> {}
class B<Y extends num> extends A<Y> {}

test(A<int> a, (int,) r, Function(num) f, FutureOr<int> fo) {
  a as B<int>; // isShapeCheckSufficient = true
  a as B<num>; // isShapeCheckSufficient = true

  r as (num,); // isShapeCheckSufficient = true
  r as (Object?,); // isShapeCheckSufficient = true

  f as Function(int); // isShapeCheckSufficient = true
  f as Function(Never); // isShapeCheckSufficient = true

  fo as FutureOr<num>; // isShapeCheckSufficient = true
  fo as FutureOr<dynamic>; // isShapeCheckSufficient = true
}

chloestefantsova commented 9 months ago

See simple version in cl/353900 (as part of a bugfix in dart2wasm compiler)

Thanks! I see it may not handle some of the type combinations, for example, the one in the example below won't be optimized even though it could be, but it gives a good idea.

class A<X> {}
class B<Y> extends A<Y> {}
test(A<Function(num)> a) => a is B<Function(num)>;

mkustermann commented 9 months ago

For backends it's somewhat irrelevant why a check can be simplified. It only matters that the check can be simplified. (So I think it's not important to distinguish whether the optimization can be made because it took the static type into consideration or not)

The kind of simplification depends on the DartType:

on <obj> is/as InterfaceType<...>: It means we can avoid checking the type arguments
on <obj> is (Object?, Object?): It means we can check the record shape only, not the individual types

I think it makes sense to separate the two, as they are different concepts.

Guess this would then be Variant 1?

One additional thing that may(?) make sense to add a flag for:

T? value;
value is T;
value as T;

=> e.g. a IsExpression.isNullCheckSufficient.

chloestefantsova commented 9 months ago

Guess this would then be Variant 1?

Thanks! I'll go with this one then. A related clarification: would it be better to have the set of four flags or an enum instead, where one of the values will signal that no optimizations can be made?

=> e.g. a IsExpression.isNullCheckSufficient.

Good point about nullability check! I'll make sure to include that.

rakudrama commented 9 months ago

I am skeptical that this will be much use to dart2js. I propose an alternative.

dart2js already detects the nullable-removal pattern x as T and compiles this as x == null ? x as T : x. The logic here is single test. It does not seem worth encoding that test in a serialized format. Are there cases that the CFE could detect that are missed here?

Another way to express the 'shape' optimization would be to replace the type expression with one instantiated to bounds, or the structural equivalent. Often the only reason we write x is List<T> is because promotion does not work for x is List, but the front end could generate an IsExpression that uses a maximal type instead of the written type (replacing List<T> with List<Object?> or List<dynamic>). The main problem with using a maximal type for an as check is that it will change the error message, which could be confusing.

In many cases these maximal types are already optimized (obj instanceof Class).

Annotating IsExpression and AsExpression with flags solves only half of the problem.

It only addresses syntactic is and as tests, not other tests, like inlined parametric covariant checks
It is all-or-nothing, whereas the tested type might be partially simplified (for example, it is reasonable to expect is Map<int, dynamic> is faster than is Map<int, List<int>>)
Mutable flags open the door to inconsistencies when a transform rewrites part of the code.

dart2js and the VM both use some kind of compiled type testing stubs. This allows x as T to be fast when the type is simple, including parametric covariance checks. For dart2js, the stubs are somewhat generated at run time. It is not clear how to connect the flag on AsExpression to the lazy runtime generation of the stub.

A preferable scheme is the following

Compute a maximal equivalent type, one of:
- CFE stores a maximal equivalent type on the AsExpression/IsExpression nodes, or
- CFE provides a utility for computing a maximal type from the static type of the value operand and the tested type, or
All compilers may use the maximal type for is-tests.
Optimizing compilers can use the maximal type for as-checks if permissible to change the error message, or have a special kind of test that uses the maximal type for the test, but has at-hand the 'true' type for creating the Error.

chloestefantsova commented 9 months ago

Compute a maximal equivalent type

I really like this! The maximal equivalent types cover a wider range of checks, and generating such helper types can be done with an extension of the algorithm I already had in mind for computing the bits. I think it’s a good idea to store the maximal equivalent types in IsExpressions and AsExpressions and preserve the level of precision of the runtime error messages.

I can see the appeal of the shape-check bits too: even if the maximal type is present, sometimes it can be helpful to know that only the shape of such type needs checking. Consider the following example.

class A<X> {}
class B<X, Y> extends A<X> {}

test(A<int> a) {
  // Maximal equivalent type: B<dynamic, String>.
  // isInterfaceTypeShapeCheckSufficient = false.
  a as B<num, String>;

  // Maximal equivalent type: B<dynamic, dynamic>.
  // isInterfaceTypeShapeCheckSufficient = true.
  a as B<num, dynamic>;
}

CFE stores a maximal equivalent type on the AsExpression/IsExpression nodes, or

CFE provides a utility for computing a maximal type from the static type of the value operand and the tested type, or

I see value in providing both, together with the shape-check bits or a shape-check enum value. @mkustermann, @rakudrama WDYT?

mkustermann commented 9 months ago

My main request is to have the functionality shared inpackage:kernel instead of all backends doing their own (slightly different) implementation. If the functionality is there, then it's easy to compute the bits - so having them on the IsExpression / AsExpression isn't so important.

The bits on the AST nodes would be a convenience - and wouldn't incur additional memory overhead as those nodes already an int flags. Adding more information to the nodes makes them consume more memory. Not sure what impact that would have (and as already mentioned, it wouldn't cover all cases as e.g. inlining would need to re-compute this for the operand type on call site and the implicit as check for parameter type)

In terms of precision: Only performing shape checks vs full type check covers many important cases, but it would of course be nice if it's more general.

Rewriting the existing type of the is/as check may loose information that backends may rely on in their type propagator. Maybe some backends have an advanced intersection type that can infer from Iterable<T> a; a as List<dynamic>; that a must be List<T> but this isn't true for all backends - it may also make getStaticType() on the nodes more expensive or less precise (?).

It is not clear how to connect the flag on AsExpression to the lazy runtime generation of the stub.

Can only speak for the VM: The flag actually tells what to do:

if flag says only nullability has to be checked which we can do inline
if flag says check only record shape, we only compare the shape which we can do inline
if flag says check subclass check ignoring type args, call a stub that checks the defaults-to-bounds type which will only check for subclasses

So maybe we can start out with having the algorithm in package:kernel?

chloestefantsova commented 9 months ago

Rewriting the existing type of the is/as check may loose information that backends may rely on in their type propagator.

Yep, that would be an inconvenience. But I suggest to store the maximal equivalent type alongside the target check type, not to replace one with the other.

So maybe we can start out with having the algorithm in package:kernel?

Yep, we can certainly do that.

chloestefantsova commented 6 months ago

Sorry for the silent on the issue! Some time ago I implemented the boolean predicate @mkustermann asked, and it can be updated into an algorithm that produces the maximal equivalent types for the type checks too. @rakudrama is there still interest in such API?

rakudrama commented 5 months ago

@chloestefantsova Yes, a function to compute a maximal equivalent type would still be used.

dart-lang / sdk

Provide CFE functionality whether is/as checks can omit checking the type arguments #54998