dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.24k stars 1.57k forks source link

object.runtimeType is more than 10 times slower when used with objects that have parameterized types #23746

Closed a-siva closed 3 years ago

a-siva commented 9 years ago

The following standalone example exhibits the problem:

verifyTypes completes in 14 milliseconds while verifyGenericTypes takes 190-200 milliseconds.

class Base {
  int a1;
}
class A1 {
  int a1;
}
class A2 {
  int a1;
}
class A3 {
  int a1;
}
class A4 {
  int a1;
}
class A5 {
  int a1;
}
class A6 {
  int a1;
}
class A7 {
  int a1;
}
class A8 {
  int a1;
}
class A9 {
  int a1;
}

verify(v1, v2) {
  if (v1.runtimeType == v2.runtimeType) {
    print("Incorrectly stating that runtime types match");
  }
}

verifyTypes() {
  var base = new Base();
  var a1 = new A1();
  verify(a1, base);
  var a2 = new A2();
  verify(a2, base);
  var a3 = new A3();
  verify(a3, base);
  var a4 = new A4();
  verify(a4, base);
  var a5 = new A5();
  verify(a5, base);
  var a6 = new A6();
  verify(a6, base);
  var a7 = new A7();
  verify(a7, base);
  var a8 = new A8();
  verify(a8, base);
  var a9 = new A9();
  verify(a9, base);
}

verifyGenericTypes() {
  var base = new List<Base>();
  var a1 = new List<A1>();
  verify(a1, base);
  var a2 = new List<A2>();
  verify(a2, base);
  var a3 = new List<A3>();
  verify(a3, base);
  var a4 = new List<A4>();
  verify(a4, base);
  var a5 = new List<A5>();
  verify(a5, base);
  var a6 = new List<A6>();
  verify(a6, base);
  var a7 = new List<A7>();
  verify(a7, base);
  var a8 = new List<A8>();
  verify(a8, base);
  var a9 = new List<A9>();
  verify(a9, base);
}

runtest() {
  var watch = new Stopwatch();
  watch.start();
  for (int i = 0; i < 10000; i++) {
    verifyTypes();
  }
  watch.stop();
  var duration = watch.elapsedMilliseconds;
  print("Time taken = $duration");

  watch.reset();
  watch.start();
  for (int i = 0; i < 10000; i++) {
    verifyGenericTypes();
  }
  watch.stop();
  duration = watch.elapsedMilliseconds;
  print("Time taken with parameterized types = $duration");
}

main() {
  while (true) {
    runtest();
  }
}

@crelier

crelier commented 9 years ago

Since this issue was filed, I think runtimeType has been intrinsified. The numbers are better now: 0.9 ms and 154 ms for ReleaseIA32 on my laptop (no access to my workstation right now). But the gap is even wider.

We have to build each generic type, finalize it, and canonicalize it. On the other hand, there is only one possible non-generic type, so it can be cached.

We only bypass finalization and canonicalization of the receiver type of a generic class at compile time. For List, it would be List, where E is the type parameter declared by class List. So this optimization does not apply in the example of this issue.

The problem here is not compile time, but run time building of a type. We build the generic type from the class of the instance and the canonical type arguments of the instance in order to pass a canonical type to the == method that will take it apart again. One way to optimize this would be to pass the class and the type arguments as a non-canonical pair (not a canonical type) to the == method.

Let me look at this in more details and at least verify the theory with profiling.

crelier commented 9 years ago

My earlier theory seems correct, except that the generic run time type does not need to be finalized, only canonicalized. Type finalization of a generic type consists mainly in building the type argument vector. We can here reuse the canonical type arguments of the instance and bypass finalization.

If I skip canonicalization, I observe a factor 3 in speedup (46 ms down from 154 ms). However, two equivalent but non-canonical types must now test equal. This requires overriding operator == in class _AbstractType and call the native AbstractType::Equals VM implementation, which is able to handle non-canonical types.

I am a bit weary of doing this, because it prohibits using types as key values of maps (keys should not implement ==), unless we treat this as a special case in the VM and suppress the error. But are there unpredictable effects of allowing non-canonical types floating around?

Let me know what you think.

Here is a git diff for illustration: diff --git a/runtime/lib/object.cc b/runtime/lib/object.cc index d8693a1..a777dbd 100644 --- a/runtime/lib/object.cc +++ b/runtime/lib/object.cc @@ -343,6 +343,14 @@ DEFINE_NATIVE_ENTRY(Object_as, 4) { }

+DEFINE_NATIVE_ENTRY(AbstractType_equals, 2) {

diff --git a/runtime/vm/bootstrap_natives.h b/runtime/vm/bootstrap_natives.h index 38a92de..da141c0 100644 --- a/runtime/vm/bootstrap_natives.h +++ b/runtime/vm/bootstrap_natives.h @@ -31,6 +31,7 @@ namespace dart { V(FunctionImpl_equals, 2) \ V(FunctionImpl_hashCode, 1) \ V(FunctionImpl_clone, 1) \

crelier commented 9 years ago

Implementing hashCode for _AbstractType does not make any difference. The error about a constant map key implementing == is still being reported. However, it can safely be suppressed for types, as it is for strings and integers. Then, all tests pass with the exception of a VM test specifically checking that runtime types are canonical (test TypeGetParameterizedTypes expects Dart_IdentityEquals to return true).

a-siva commented 3 years ago

//cc @crelier do we still have something to do in this issue or can it be closed as obsolete ?

crelier commented 3 years ago

@a-siva We do not have any pending work in this area. Generics will never be free. Yes, I would close this issue.

mraleph commented 3 years ago

Here is a slightly updated benchmark:

class Base {
  int a1 = 0;
}

class A1 {
  int a1 = 0;
}

class A2 {
  int a1 = 0;
}

class A3 {
  int a1 = 0;
}

class A4 {
  int a1 = 0;
}

class A5 {
  int a1 = 0;
}

class A6 {
  int a1 = 0;
}

class A7 {
  int a1 = 0;
}

class A8 {
  int a1 = 0;
}

class A9 {
  int a1 = 0;
}

@pragma('vm:never-inline')
verify(v1, v2) {
  if (v1.runtimeType == v2.runtimeType) {
    print("Incorrectly stating that runtime types match");
  }
}

final a0 = new Base();
final a1 = new A1();
final a2 = new A2();
final a3 = new A3();
final a4 = new A4();
final a5 = new A5();
final a6 = new A6();
final a7 = new A7();
final a8 = new A8();
final a9 = new A9();

verifyTypes() {
  verify(a1, a0);
  verify(a2, a0);
  verify(a3, a0);
  verify(a4, a0);
  verify(a5, a0);
  verify(a6, a0);
  verify(a7, a0);
  verify(a8, a0);
  verify(a9, a0);
}

final l0 = new List<Base>.empty();
final l1 = new List<A1>.empty();
final l2 = new List<A2>.empty();
final l3 = new List<A3>.empty();
final l4 = new List<A4>.empty();
final l5 = new List<A5>.empty();
final l6 = new List<A6>.empty();
final l7 = new List<A7>.empty();
final l8 = new List<A8>.empty();
final l9 = new List<A9>.empty();

verifyGenericTypes() {
  verify(l1, l0);
  verify(l2, l0);
  verify(l3, l0);
  verify(l4, l0);
  verify(l5, l0);
  verify(l6, l0);
  verify(l7, l0);
  verify(l8, l0);
  verify(l9, l0);
}

runtest() {
  var watch = new Stopwatch();
  watch.start();
  for (int i = 0; i < 100000; i++) {
    verifyTypes();
  }
  watch.stop();
  var duration = watch.elapsedMilliseconds;
  print("Time taken = $duration");

  watch.reset();
  watch.start();
  for (int i = 0; i < 100000; i++) {
    verifyGenericTypes();
  }
  watch.stop();
  duration = watch.elapsedMilliseconds;
  print("Time taken with parameterized types = $duration");
}

main() {
  while (true) {
    runtest();
  }
}

Note that there is still quite some difference between parameterized and non-parameterized classes, though much less if we are running in AOT mode:

$ out/ReleaseX64/dart_precompiled_runtime /tmp/bench.aot
Time taken = 12
Time taken with parameterized types = 106
Time taken = 4
Time taken with parameterized types = 96
...
$ out/ReleaseX64/dart /tmp/bench.dart
Time taken = 23
Time taken with parameterized types = 919
Time taken = 10
Time taken with parameterized types = 916

AOT is faster because we have added a fast path for a.runtimeType == b.runtimeType in AOT mode (for Flutter) which actually avoids calling runtimeType altogether, however it still falls into runtime to handle type argument vector comparison. We can intrinsify it further (e.g. add fast path for comparison when we compare full type argument vectors - which should be just a pointer check as we expect them to be canonicalised).

crelier commented 3 years ago

Good point! It should not be too complicated to improve the intrinsics to handle type arguments. Also, now that we have proper functions types, comparing the runtime types of closures could be intrinsified to a pointer comparison. Of course, this would increase code size. Is it worth it?

a-siva commented 3 years ago

I might have gotten the code pattern from Ian when they were optimizing the check for whether a widget tree needs to be rebuilt or not.

mraleph commented 3 years ago

Of course, this would increase code size. Is it worth it?

I would not expect too much of an impact on code size, given that we will just improve the intrinsic itself.

Maybe we could take some Flutter apps (e.g. Gallery or some internal app) and see if we ever hit slow path for _haveSameRuntimeType? If we do hit slow path often enough then we can see if a few more checks on the fast path can make it work without falling to the slow path.

If we don't then maybe we can close this.

rakudrama commented 3 years ago

I would be interested in the results of the analysis suggested above. It would indicate whether we should look into something similar for dart2js. Do you know how often one of the arguments to _haveSameRuntimeType is this?

I tried to make the RuntimeType.Widget.canUpdate benchmarks representative of what I saw at the time in Flutter's Widget.canUpdate path, so I expect any improvements will show up on Golem.

crelier commented 3 years ago

Self-assigning this issue so it does not disappear again from the radar for another 6 years. I'll give it a try when I get cycles.

crelier commented 3 years ago

I worked on this and sent a CL for review that handles generic types in intrinsics for type equality and runtimeType comparison.

Generic types with equal class ids and equal type arguments are now considered equal by the intrinsics and a runtime call is avoided.

With the test program above in precompiled mode on my workstation, instead of getting:

Time taken = 4
Time taken with parameterized types = 331

I now get this:

Time taken = 4
Time taken with parameterized types = 5
mraleph commented 3 years ago

I have reverted the landed CL due to a bug https://github.com/flutter/flutter/issues/84813

While investigating the problem I have also noticed that we seem to have another bug in the optimization:

class A<T, U> {
  B<T> get b => B<T>();
}

class B<T> {}

@pragma('vm:never-inline')
bool test(Object a, Object b) {
  print(a.runtimeType);
  print(b.runtimeType);
  return a.runtimeType == b.runtimeType;
}

void main() {
  print(test(B<int>(), A<int, String>().b));
}

Prints false even with CL reverted. I think this is the problem that https://dart-review.googlesource.com/c/sdk/+/45340/ was supposed to fix, but that CL did not actually include a code path that tested a.runtimeType == b.runtimeType pattern, so I think we missed that it did not actually fix it for some reason.

crelier commented 3 years ago

@mraleph Thanks for providing a new test. I added it as a regression test. I also first thought that the issue was in the optimizer, but it is actually in the runtime. The fix is coming with a reland of the CL that you reverted (thanks!).