dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.11k stars 1.57k forks source link

VM: 25% of dart2js run time in MegamorphicLookup #26017

Open rakudrama opened 8 years ago

rakudrama commented 8 years ago

Run dart2js on a large program with this command:

DART_VM_OPTIONS='--observe=8181 --profile-period=5000 --pause-isolates-on-exit --old_gen_growth_rate=9999' sdk/bin/dart2js ~/app/main.dart -v --out=g1.js

At isolate exit, get the cpu profile, sort by Tags: VM > User

75% of time is spend in "Dart" Of this time, 34% is in "Type Inference" Of this time, 31% is in [Stub] MegamorphicLookup

This seems like an excessive overhead.

@a-siva

fsc8000 commented 8 years ago

Apart from making megamorphic calls faster in the VM: Is there any way to avoid some of these megamorphic call sites in dart2js? - They will always be slower than monomorphic calls.

bwilkerson commented 8 years ago

Note that this strongly impacts the dartanalyzer and the analysis server as well. Between those three tools, it impacts the lives of every Dart developer every day in some way or another.

As I understand it, at least one source of these megamorphic call sites is as follows. Design a class hierarchy in which a method defined at the root of the hierarchy is overridden in a large enough number of subclasses, then write a general method that can, and does, take an instance of any class in that hierarchy and invoke this method. (By the way, how many implementations of a method does it take to make it megamorphic?)

If that's true, then, yes, there might be a way of avoiding some of them, but only by writing worse code. Support for polymorphism is near the heart of what it means to be an object-oriented language with inheritance. Anything you can do to improve the performance of these cases would benefit every developer every day.

rakudrama commented 8 years ago

@fsc8000 I can get rid of some of them by copying methods from mixins and base classes to the classes containing definitions of the other members used by the method. I would rather the runtime do that when it can tell it is worthwhile.

Consider:

abstract class ListMixin<E> implements List<E> {
  // int get length;  in List interface.
  bool get isEmpty => length == 0;
  bool get isNotEmpty => !isEmpty;

Once ListMixin is used above some number of times, with some clients defining isEmpty, ListMixin.isNotEmpty contains a megamorphic call to isEmpty which contains a megamorphic call to length. We have the equivalent structure in our Element model.

rakudrama commented 8 years ago

I can help categorize cases of MegamorphicLookup:

[A]. Calls from a method in a base class to a method that is defined on one or more derived classes. [B]. Calls from a method in a mixin class to method provided at or below a mixin application. These are really the same issue. We are failing to capitalize on information inherent in the receiver. It can be very effective to manually clone the caller to the class that defines the callees. The runtime should do this for small methods.

[C]. Overgeneralization. (This is what Brian describes)

class C {
   foo() {...}
}
class C1 extends C {}
...
class C20 extends C3 {}

method(C p) {
   p.foo();
}

None of the dozen or so subclasses of C declares anything called foo. What I see is that method gets optimized after seeing a few classes, gets deoptimized, and then recompiled. The recompiled code contains the MegamorphicLookup. The problem is that there is no intermediate step between a few classes and all classes (megamorphic call). There is a huge hint in the declared type. If the call is monomorphic for the declared type, develop an efficient predicate for the 'is C' check and widen the test to that. In many cases of OO code, 'is C' could be a range check.

Given that p.foo is monomorphic, the initial compilation should have widened the receiver check as much as possible and prevented the deoptimization.

In dart2js, WorkQueue.add has three field accesses that are each a MegamorphicLookup. They should be single x64 load and store instructions. The cid values in the megamorphic cache are {2870-1, 2873-5, 2877-9, 2881-2887, 2889}. The holes are probably abstract classes in the hierarchy.

[D] Calling get:hashCode and ==. 3% of dart2js runtime is in MegamorphicLookup called from hashtable implementations to find get:hashCode. It is much harder to estimate == since it is called from so many places. MegamorphicLookup of toString is 15% of the time in StringBuffer.write Perhaps there ought to be a more vtable-like lookup of these methods defined on Object.


An example that combines [A] and [D] is using small integers as hash table keys.

class Object {
  int get hashCode => _identityHashCode;
}
class _Smi {
  int get _identityHashCode => this;
}
the hashtable calls:
  MegamorphicLooup (get:hashCode)
  Object.hashCode
Object.hashCode calls:
  MegamorphicLookup (get:_identityHashCode)
  _Smi._identityHashCode

If the VM had cloned Object.hashCode into _Smi, Object.hashCode@_Smi could be optimized by inlining _Smi._identityHashCode. Tracking polymorphic call targets might be a reasonable way of deciding which methods should be considered cloning to the target class.

rakudrama commented 8 years ago

dartfmt (third_party/pkg_tested/dart_style/bin/format.dart) has some interesting case. Run it on the 1.8M file sdk/lib/html/dartium/html_dartium.dart

dartfmt spends 28% of Dart time in MegamorphicLookup. @munificent

10% of the time is in MegamorphicLookup for get:hashCode. This is issue [D]. (Another 3% is spent in the hash function which returns a field and would be faster if frameless.)

There is a closure that calls MegamorphicLookup accounting for 5% of total Dart time.

(outer) {
      if (!outer.splitsOnInnerRules) return;
      rule.imply(outer);
    }

There is a 'Rule' hierarchy of 7 classes Both outer and rule only ever are subclasses of Rule. get:splitsOnInnerRules and imply() both use megamorphic caches. There is only one definition of imply(), so this is like case [C] above.

There is a default implementation of get splitsOnInnerRules => true and and one override (2 classes) that returns an adjacently declared field. These two implementations are inlined when outer has only three classes are known, but the code gets deoptimized and five classes causes a megamorphic lookup. There are no type declarations to help here but the class check and dispatch should have been extended to all classes in the same hierarchy that use the definitions in the inline cache.

rakudrama commented 8 years ago

More details about the MegamorphicLookup in dartfmt, at .splitsOnInnerRules.

(outer) {
      if (!outer.splitsOnInnerRules) return;
      rule.imply(outer);
    }

There are 5 classes in the ICData and megamorphic cache (one class is not instantiated, one class is abstract). There are two targets. The class-ids look a little bit irregular:

cid Class In cache? Notes
1572 ::
1573 ArgumentRule abstract
1574 PositionalRule Yes
1575 NamedRule Yes
1576 ::
1577 CombinatorRule Yes
1578 ::
1579 MetadataRule not instantiated in this run
1580 ::
1581 Rule Yes base class, not abstract
1582 ::
1583 TypeArgumentRule Yes
1584 ::

The :: names are the classes that implement top-level library scopes. This is because the class hierarchy is spread over several libraries. What we can learn from this is that class hierarchies do not currently always have nice ranges of class id values.

fsc8000 commented 8 years ago

Thanks. The issue with :: classes should be easily addressable. In this case though there would still be a hole in the range because of MetaDataRule. We'd need to expand the set of cids using CHA to make it a 100% dense range.

btw: My CL (https://github.com/dart-lang/sdk/commit/f6c19ee595ca9495c8b9520259b1cb55b2e152a6) brought time spent megamorphic lookup for dartfmt down from 28% to 22%.

fsc8000 commented 8 years ago

The improvement with https://github.com/dart-lang/sdk/commit/f6c19ee595ca9495c8b9520259b1cb55b2e152a6 on dartfmt is around 5%.

Time before and after my change:

5.235
5.289
5.288
5.230
5.211
5.298
5.228
5.282
5.235
5.242

4.983
4.982
5.009
5.026
5.027
4.962
5.018
4.957
5.037
5.053
rakudrama commented 8 years ago

I have opened separate tracking issues for each example that I have seen

https://github.com/dart-lang/sdk/issues/26217 https://github.com/dart-lang/sdk/issues/26218 https://github.com/dart-lang/sdk/issues/26219 https://github.com/dart-lang/sdk/issues/26220 https://github.com/dart-lang/sdk/issues/26221 https://github.com/dart-lang/sdk/issues/26222 https://github.com/dart-lang/sdk/issues/26223

leafpetersen commented 7 years ago

See also https://github.com/dart-lang/sdk/issues/26858