dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.19k stars 1.57k forks source link

Proposal: Reuse StackZones #48755

Open gaaclarke opened 2 years ago

gaaclarke commented 2 years ago

I may be off base here but this is an observation I made looking at the profiling results from Flutter.

I was running performance benchmarks for Flutter ( https://github.com/flutter/flutter/blob/master/dev/benchmarks/microbenchmarks/lib/stocks/build_bench.dart ) and I saw that the largest amount of time spent in rebuilding the widget tree is in ~StackZone. Third expensive was creating them.

Screen Shot 2022-04-06 at 5 30 40 PM

The StackZone's that are getting deallocated constantly during the test are getting allocated here:

io.flutter.1.ui (9)#0   0x0000000102e2bcfc in dart::StackResource::StackResource(dart::ThreadState*) [inlined] at /Users/aaclarke/dev/engine/src/third_party/dart/runtime/vm/allocation.h:25
#1  0x0000000102e2bcfc in dart::StackZone::StackZone(dart::ThreadState*) [inlined] at /Users/aaclarke/dev/engine/src/third_party/dart/runtime/vm/zone.cc:323
#2  0x0000000102e2bcfc in dart::StackZone::StackZone(dart::ThreadState*) at /Users/aaclarke/dev/engine/src/third_party/dart/runtime/vm/zone.cc:323
#3  0x0000000102df9174 in dart::DRT_InstantiateType(dart::NativeArguments) at /Users/aaclarke/dev/engine/src/third_party/dart/runtime/vm/runtime_entry.cc:544
#4  0x0000000101c343c8 in Precompiled_Stub_CallToRuntime ()

That correlates to https://github.com/dart-lang/sdk/blob/67f93d3840981ad81b44bceb0b755fbab50f13bd/runtime/vm/runtime_entry.h#L113

Looking at that code it seems like we should be reusing StackZone associated with an isolate instead of creating and destroying them anytime InstantiateType is called. There should be something like:

StackZone::Scope scope = isolate->stackZone.mark();
DoCallToRuntime(isolate, thread, zone.GetZone(), arguments);

Where ~Scope resets the position of the stack instead of deleting the whole region of memory.

mkustermann commented 2 years ago

Runtime calls in in general introduce non-trivial amount of overhead. So we should only enter the runtime in slow paths.

For this particular case it seems like such calls are due to the DRT_InstantiateType() runtime entry.

Those instantiations originate from Dart source code that uses uninstantiated type literals, such as:

class Foo<T> {
  Type foo() => T;
}

and we (surprisingly) have no optimizations / fast paths for this atm. I think we should fix that: https://github.com/dart-lang/sdk/issues/48757

dnfield commented 2 years ago

see https://github.com/dart-lang/sdk/issues/33297

gaaclarke commented 2 years ago

I've also seen this stacktrace causing StackZones to be created and destroyed during rebuild:

#0  0x0000000102c6fcfc in dart::StackResource::StackResource(dart::ThreadState*) [inlined] at /Users/aaclarke/dev/engine/src/third_party/dart/runtime/vm/allocation.h:25
#1  0x0000000102c6fcfc in dart::StackZone::StackZone(dart::ThreadState*) [inlined] at /Users/aaclarke/dev/engine/src/third_party/dart/runtime/vm/zone.cc:323
#2  0x0000000102c6fcfc in dart::StackZone::StackZone(dart::ThreadState*) at /Users/aaclarke/dev/engine/src/third_party/dart/runtime/vm/zone.cc:323
#3  0x0000000102ba5444 in dart::NativeEntry::BootstrapNativeCallWrapper(_Dart_NativeArguments*, void (*)(_Dart_NativeArguments*)) at /Users/aaclarke/dev/engine/src/third_party/dart/runtime/vm/native_entry.cc:136
#4  0x0000000101ab7ef8 in Precompiled_Stub_CallBootstrapNative ()
mraleph commented 2 years ago

@a-siva has landed d856e058ea764fade95fdf76530c741079bd63e1 which brought significant improvement (because we use stack memory to store the initial Zone segment). A number of benchmarks that do runtime calls have seen significant improvement - which indicate that memory allocation from malloc is rather expensive.

Siva, given the improvements --- do you think we could also do something for environments where stack size is limited? It would be nice if all our builds had faster runtime calls.

gaaclarke commented 2 years ago

Here is the drop to our build time showing up in the microbenchmark (10% increase in performance): https://flutter-flutter-perf.skia.org/e/?end=1651961508&queries=sub_result%3Dstock_build_iteration%26sub_result%3Dstock_build_iteration_probability_5pct