dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.07k stars 1.56k forks source link

[vm/core] Inlining `_GrowableList.add` bloats program too much #56276

Open rakudrama opened 1 month ago

rakudrama commented 1 month ago

Inlining of _GrowableList.add causes too much code size expansion.

The extra code in the aot-snapshot for dart2js is 2.22%:

inlined? size
@pragma("vm:prefer-inline") 20552376
@pragma("vm:never-inline") 20095448

_GrowableList.add does quite a lot - a covariance check, capacity check (with out-of-line reallocation of backing list), length update, bounds check, array write barrier.

  @pragma("vm:entry-point", "call")
  @pragma("vm:prefer-inline")
  void add(T value) {         // Parametric covariance check
    var len = length;
    if (len == _capacity) {
      _growToNextCapacity();
    }
    _setLength(len + 1);
    this[len] = value;          // Bounds check; write barrier.
  }

One argument for inlining is that after inlining, some of these operations can be eliminated. How effective is inlining at eliminating these operations? Grepping the disassembly allows us to estimate the operations.

inlined as array write barriers bounds checks
yes 3333 4036 10906
no 2515 1991 8660
Δ 818 2045 2245

Most conserved after inlining is the bounds check (that can be eliminated in all cases, see #56099). 9% of the write barriers can be eliminated. Over half of parametric covariance checks can be eliminated, and ~37% remain. So over a third of the calls do not benefit in any special way from inlining. For these calls, the benefit of inlining is limited to the dynamic instruction count, the ~20% of the instructions related to the call - prologue, stack check, epilogue generally vanish after inlining,

For many of the the 60%+ of calls where the parametric covariance check can be eliminated, it might be a better approach to have a specialization of List.add that does not do the covariance check, and call that instead of inlining.

mnordine commented 1 month ago

Was dart2js benchmarked before and after?

a-siva commented 1 month ago

//cc @alexmarkov

a-siva commented 1 month ago

@rakudrama did you get a chance to measure the performance impact of this

rakudrama commented 1 month ago

@a-siva I did some timing locally.

I compiled dart2js with dart compile exe and then used that aot executable to compile Flute.complex. I aggregated the wall-clock time reported by dart2js over 100 runs on my cloudtop instance. I ran the versions alternating A,B,A,B,..., 100 of each

Version executable size mean time (seconds) std dev min max
A - inlined 26322640 21.069 0.910241 20.0 24.6
B - not inlined 25865344 21.09 0.729178 19.9 23.7

They are surprisingly close. Perhaps none of the ~2k calls to add are hot enough to register in this kind of profiling.

The above runs were done when my computer was otherwise quiet.

And earlier run of 60 A,B pairs is noisier. I was using a browser at the same time.

Version executable size mean time (seconds) std dev min max
A - inlined 26322640 22.185 1.39174 19.9 24.7
B - not inlined 25865344 22.3917 1.56549 20.0 27.7

Here all of the values (mean, min, max) are worse, but the variance is high enough that this is not clearly a difference.