dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.08k stars 1.56k forks source link

[vm] Not enough happens during canonicalization #48414

Open rakudrama opened 2 years ago

rakudrama commented 2 years ago

The JIT reduces some calls to dart2js's CallStructure factory constructor to a constant. AOT fails to do this.

It appears that the reason is that not enough good things happen during canonicalization to make the inlinee look appealing:

  1. _ImmutableList.length is not replaced with LoadField early enough to allow constant-folding to simplify the inlinee . Curiously, [] is replaced with a bounds check and indexed load, but...

  2. GenericCheckBound has no constant-inputs canonicalization, or perhaps it can't be removed but inhibits other reductions.

Static calls to the getters of fields of user-defined constants are replaced by LoadField instructions earlier, making them available to reduction by canonicalization or constant-folding. This leaves constant lists at a disadvantage when compared to user-defined constants.

(I believe the JIT succeeds simply because it is happy to inline larger methods and does the constant-folding later.)

The factory constructor does a lookup in a const table. An example of a call that the JIT inlines and reduces to a constant is here.

Consider this simplified version, using a single level of array:

class CallStructure {
  static const List<CallStructure> _common = [
    CallStructure._(0),
    CallStructure._(1),
    CallStructure._(2),
    CallStructure._(3)
  ];
...
  factory CallStructure.unnamed(int argumentCount,
      [int typeArgumentCount = 0]) {
    // This simple canonicalization of common call structures greatly reduces
    // the number of allocations of CallStructure objects.
    if (typeArgumentCount == 0 && argumentCount < _common.length) {
      return _common[argumentCount];
    }
    return CallStructure._(argumentCount, typeArgumentCount);
  }

When the inlining decision is made, _common.length is still a static call. It has not been reduced to 4, so the path with the constructor call is still in the callee graph:

Callee graph for inlining file:///usr/local/google/home/sra/play1/bug1c4k.dart_CallStructure_CallStructure.unnamed (optimized)
==== file:///usr/local/google/home/sra/play1/bug1c4k.dart_CallStructure_CallStructure.unnamed (Constructor)
B0[graph]:0 {
      v5 <- Constant(#null)
      v6 <- Constant(#<optimized out>)
      v18 <- Constant(#true) T{bool}
      v21 <- Constant(#_ImmutableList len:4) T{_ImmutableList}
      v34 <- Constant(#4) T{_Smi}
}
B8[function entry]:2 {
      v7 <- Constant(#null)
      v9 <- Constant(#2) T{_Smi}
      v11 <- Constant(#0) T{_Smi}
      v13 <- SpecialParameter(ArgDescriptor)
}
    v22 <- StaticCall:38( get:length<0> v21, recognized_kind = ImmutableArrayLength, result_type = T{_Smi}) T{_Smi}
    v24 <- RelationalOp(<, v9, v22) T{bool}
    Branch if StrictCompare:44(===, v24 T{bool}, v18) goto (9, 10)
B9[target]:50
    v30 <- GenericCheckBound:52(v34, v9)
    v32 <- LoadIndexed(v21, v30) T{CallStructure}
    Return:56(v32)
B10[target]:64
    v19 <- AllocateObject:68(cls=CallStructure) T{CallStructure}
    StaticCall:70( CallStructure._@13274965<0> v19, v9, v11)
    Return:74(v19)
     Bailout: heuristics (no small leaf)

Interestingly, I can write a list wrapper :

class LIST<T> {
  final int length;
  final List<T> elements;
  const LIST(this.length, this.elements);
}

class CallStructure {
  static const _common = LIST(4, [
    CallStructure._(0),
    CallStructure._(1),
    CallStructure._(2),
    CallStructure._(3)
  ]);
...
  factory CallStructure.unnamed(int argumentCount,
      [int typeArgumentCount = 0]) {
    // This simple canonicalization of common call structures greatly reduces
    // the number of allocations of CallStructure objects.
    if (typeArgumentCount == 0 && argumentCount < _common.length) {
      return _common.elements[argumentCount];
    }
    return CallStructure._(argumentCount, typeArgumentCount);
  }

This version gets inlined, and the final code for demo1 contains a load of the constant value:

Callee graph for inlining file:///usr/local/google/home/sra/play1/bug1c4k3.dart_CallStructure_CallStructure.unnamed (optimized)
==== file:///usr/local/google/home/sra/play1/bug1c4k3.dart_CallStructure_CallStructure.unnamed (Constructor)
B0[graph]:0 {
      v5 <- Constant(#null)
      v6 <- Constant(#<optimized out>)
      v22 <- Constant(#4) T{_Smi}
      v25 <- Constant(#_ImmutableList len:4) T{_ImmutableList}
}
B9[function entry]:2 {
      v7 <- Constant(#null)
      v9 <- Constant(#2) T{_Smi}
      v11 <- Constant(#0) T{_Smi}
      v13 <- SpecialParameter(ArgDescriptor)
}
    v30 <- GenericCheckBound:54(v22, v9) T{_Smi}
    v32 <- LoadIndexed(v25, v30) T{CallStructure}
    Return:58(v32)
     Success
       with reason need to count first, code size 3, call sites: 0

The two-level version still does not work. This is because the LoadIndexed is not reduced to a constant. I suspect what is blocking this is that GenericCheckBound has a no-op Canonicalize method, or is opaque to the LoadIndexed canonicalization.


This is a full stand-alone test, demo1 should have the inlined constant:

```dart // Copyright (c) 2015, the Dart project authors. Please see the AUTHORS file // for details. All rights reserved. Use of this source code is governed by a // BSD-style license that can be found in the LICENSE file. main() { for (int i = 0; i < 1000000; i++) { demo1(); demo2(); print(CallStructure(1, ['x'], 2)); } } @pragma('vm:never-inline') demo1() { print(CallStructure.unnamed(2)); } @pragma('vm:never-inline') demo2() { print(CallStructure(2)); } class CallStructure { /// Tag used for identifying serialized [CallStructure] objects in a debugging /// data stream. static const String tag = 'call-structure'; static const CallStructure NO_ARGS = CallStructure._(0); static const CallStructure ONE_ARG = CallStructure._(1); static const CallStructure TWO_ARGS = CallStructure._(2); static const List> _common = [ [NO_ARGS, CallStructure._(0, 1), CallStructure._(0, 2)], [ONE_ARG, CallStructure._(1, 1), CallStructure._(1, 2)], [TWO_ARGS, CallStructure._(2, 1), CallStructure._(2, 2)], [CallStructure._(3), CallStructure._(3, 1), CallStructure._(3, 2)], [CallStructure._(4), CallStructure._(4, 1), CallStructure._(4, 2)], [CallStructure._(5), CallStructure._(5, 1), CallStructure._(5, 2)], [CallStructure._(6)], [CallStructure._(7)], [CallStructure._(8)], [CallStructure._(9)], [CallStructure._(10)], ]; /// The number of type arguments of the call. final int typeArgumentCount; /// The numbers of arguments of the call. Includes named arguments. final int argumentCount; /// The number of named arguments of the call. int get namedArgumentCount => 0; /// The number of positional argument of the call. int get positionalArgumentCount => argumentCount; const CallStructure._(this.argumentCount, [this.typeArgumentCount = 0]); factory CallStructure.unnamed(int argumentCount, [int typeArgumentCount = 0]) { // This simple canonicalization of common call structures greatly reduces // the number of allocations of CallStructure objects. if (argumentCount < _common.length) { final row = _common[argumentCount]; if (typeArgumentCount < row.length) { final result = row[typeArgumentCount]; assert(result.argumentCount == argumentCount && result.typeArgumentCount == typeArgumentCount); return result; } } return CallStructure._(argumentCount, typeArgumentCount); } factory CallStructure(int argumentCount, [List? namedArguments, int typeArgumentCount = 0]) { if (namedArguments == null || namedArguments.isEmpty) { return CallStructure.unnamed(argumentCount, typeArgumentCount); } return _NamedCallStructure( argumentCount, namedArguments, typeArgumentCount, null); } /// Returns `true` if this call structure is normalized, that is, its named /// arguments are sorted. bool get isNormalized => true; /// Returns the normalized version of this call structure. /// /// A [CallStructure] is normalized if its named arguments are sorted. CallStructure toNormalized() => this; CallStructure withTypeArgumentCount(int typeArgumentCount) => CallStructure(argumentCount, namedArguments, typeArgumentCount); /// `true` if this call has named arguments. bool get isNamed => false; /// `true` if this call has no named arguments. bool get isUnnamed => true; /// The names of the named arguments in call-site order. List get namedArguments => const []; /// The names of the named arguments in canonicalized order. List getOrderedNamedArguments() => const []; CallStructure get nonGeneric => typeArgumentCount == 0 ? this : CallStructure(argumentCount, namedArguments); /// A description of the argument structure. String structureToString() { StringBuffer sb = StringBuffer(); sb.write('arity=$argumentCount'); if (typeArgumentCount != 0) { sb.write(', types=$typeArgumentCount'); } return sb.toString(); } @override @pragma('vm:never-inline') String toString() => 'CallStructure(${structureToString()})'; bool match(CallStructure other) { if (identical(this, other)) return true; return this.argumentCount == other.argumentCount && this.namedArgumentCount == other.namedArgumentCount && this.typeArgumentCount == other.typeArgumentCount && _sameNames(this.namedArguments, other.namedArguments); } @override int get hashCode { return Object.hash(Object.hashAll(namedArguments), argumentCount, typeArgumentCount); } @override bool operator ==(other) { if (other is! CallStructure) return false; return match(other); } static bool _sameNames(List first, List second) { assert(first.length == second.length); for (int i = 0; i < first.length; i++) { if (first[i] != second[i]) return false; } return true; } } /// Call structure with named arguments. This is an implementation detail of the /// CallStructure interface. class _NamedCallStructure extends CallStructure { @override final List namedArguments; /// The list of ordered named arguments is computed lazily. Initially `null`. List? _orderedNamedArguments; _NamedCallStructure(int argumentCount, this.namedArguments, int typeArgumentCount, this._orderedNamedArguments) : assert(namedArguments.isNotEmpty), super._(argumentCount, typeArgumentCount); @override bool get isNamed => true; @override bool get isUnnamed => false; @override int get namedArgumentCount => namedArguments.length; @override int get positionalArgumentCount => argumentCount - namedArgumentCount; @override bool get isNormalized => identical(namedArguments, getOrderedNamedArguments()); @override CallStructure toNormalized() => isNormalized ? this : _NamedCallStructure(argumentCount, getOrderedNamedArguments(), typeArgumentCount, getOrderedNamedArguments()); @override List getOrderedNamedArguments() { return _orderedNamedArguments ??= _getOrderedNamedArguments(); } List _getOrderedNamedArguments() { List ordered = List.of(namedArguments, growable: false); ordered.sort((String first, String second) => first.compareTo(second)); // Use the same List if [namedArguments] is already ordered to indicate this // _NamedCallStructure is normalized. if (CallStructure._sameNames(ordered, namedArguments)) { return namedArguments; } return ordered; } @override String structureToString() { StringBuffer sb = StringBuffer(); sb.write('arity=$argumentCount, named=[${namedArguments.join(', ')}]'); if (typeArgumentCount != 0) { sb.write(', types=$typeArgumentCount'); } return sb.toString(); } } ```

Full code for simplified version:

```dart main() { for (int i = 0; i < 1000000; i++) { demo1(); demo2(); print(CallStructure(6, 6)); } } @pragma('vm:never-inline') demo1() { print(CallStructure.unnamed(2)); } @pragma('vm:never-inline') demo2() { print(CallStructure(2)); } class CallStructure { static const List _common = [ CallStructure._(0), CallStructure._(1), CallStructure._(2), CallStructure._(3) ]; /// The number of type arguments of the call. final int typeArgumentCount; /// The numbers of arguments of the call. Includes named arguments. final int argumentCount; const CallStructure._(this.argumentCount, [this.typeArgumentCount = 0]); factory CallStructure.unnamed(int argumentCount, [int typeArgumentCount = 0]) { // This simple canonicalization of common call structures greatly reduces // the number of allocations of CallStructure objects. if (typeArgumentCount == 0 && argumentCount < _common.length) { return _common[argumentCount]; } return CallStructure._(argumentCount, typeArgumentCount); } factory CallStructure(int argumentCount, [int typeArgumentCount = 0]) => CallStructure.unnamed(argumentCount, typeArgumentCount); } ```

Simplified version with wrapper:

```dart main() { for (int i = 0; i < 1000000; i++) { demo1(); demo2(); print(CallStructure(6, 6)); } } @pragma('vm:never-inline') demo1() { print(CallStructure.unnamed(2)); } @pragma('vm:never-inline') demo2() { print(CallStructure(2)); } class LIST { final int length; final List elements; const LIST(this.length, this.elements); } class CallStructure { static const _common = LIST(4, [ CallStructure._(0), CallStructure._(1), CallStructure._(2), CallStructure._(3) ]); /// The number of type arguments of the call. final int typeArgumentCount; /// The numbers of arguments of the call. Includes named arguments. final int argumentCount; const CallStructure._(this.argumentCount, [this.typeArgumentCount = 0]); factory CallStructure.unnamed(int argumentCount, [int typeArgumentCount = 0]) { // This simple canonicalization of common call structures greatly reduces // the number of allocations of CallStructure objects. if (typeArgumentCount == 0 && argumentCount < _common.length) { return _common.elements[argumentCount]; } return CallStructure._(argumentCount, typeArgumentCount); } factory CallStructure(int argumentCount, [int typeArgumentCount = 0]) => CallStructure.unnamed(argumentCount, typeArgumentCount); } ```
rakudrama commented 2 years ago

I have been looking into this.

We can compensate for the too-late inlining of _ImmutableList.length in constant propagation:

void ConstantPropagator::VisitStaticCall(StaticCallInstr* instr) {
  ...
    case MethodRecognizer::kImmutableArrayLength:
    case MethodRecognizer::kObjectArrayLength: {
      ASSERT(instr->FirstArgIndex() == 0);
      const Object& o = instr->ArgumentAt(0)->constant_value();
      if (o.IsArray()) {
        const auto& array = Array::Cast(o);
        SetValue(instr, Integer::ZoneHandle(Z, Integer::New(array.Length())));
        return;
      }
      break;
    }
...

It is slightly unsatisfactory to have another place that knows how to reduce a load of the length of a fixed-length list.

The GenericCheckBound is blocking further constant-folding because of this:

void ConstantPropagator::VisitGenericCheckBound(GenericCheckBoundInstr* instr) {
  // Don't propagate constants through check, since it would eliminate
  // the data dependence between the bound check and the load/store.
  // Graph finalization will expose the constant eventually.
  SetValue(instr, non_constant_);
}

I think we can have our cake and eat it too. We don't have to eliminate the data dependence. We can propagate the constant index, but not replace the GenericCheckBoound with a constant. This would enable the subsequent LoadIndexed to be constant-folded, but other uses would still be guarded.

Question: Is this right?

An amusing aside: I tried replacing, in ConstantPropagator::LoadIndexed, the index ->definition()->constant_value() with ->definition->OriginalDefinition()->constant_value(). This broke the dataflow, since the transfer function for an instruction was no longer a function of the inputs of the instruction. Lucky for me this broke the build.

Another approach is to add constant-folding of LoadIndexed to canonicalization, where there is no dataflow invariants to uphold, and it is reasonable to inspect OriginalDefinitions.