dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.09k stars 1.56k forks source link

VM GC can be slow with large old-space lists #30433

Closed rakudrama closed 5 years ago

rakudrama commented 7 years ago

On my workstation, the following program runs in ~30s. If I reduce the list size by removing the * 8 it takes ~8s.

Now I don't know if this program is in any way realistic. How would I tell if my app behaviour has this behaviour? (i.e. the combination of large lists and mutation)

I instrumented the builtin hash map and hash set in runtime/lib to print list allocation sizes of 1M and over. dart2js compiles of a certain large app do create at least two hash maps with lists containing 2M elements. These are likely to be long-lived. (I thought of breaking the hash map into a multi-level hash map, but this is unlikely to help since there will be multiple smaller lists with random access that add up to 2M elements. A segmented approach would work better with lists that have exploitable access patterns, e.g. appending.)

const N = 128 * 1024 * 8;

var a = new List(N);

main() {
  int ix = 0;

  var root = new Link(null);

  var sw = new Stopwatch()..start();

  for (int i = 0; i < 1000; i++) {
    for (int j = 0; j < 1000; j++) {
      a[ix] = new Object();
      var p = root;
      for (int k = 0; k < 1000; k++) {
        p = new Link(p);
      }
      a[ix] = p.length;
      ix = (ix + 14999981).remainder(a.length);
    }
  }

  print(sw.elapsed);
}

class Link {
  Link next;
  Link(this.next);

  int get length {
    var i = 0;
    for (Link p = this; p != null; p = p.next) i++;
    return i;
  }
}
rmacnak-google commented 7 years ago

This might be because the remembered set tracks objects instead of slots.

rakudrama commented 6 years ago

vm folks: can you provide instructions on how to determine if this issue actually affects my app (dart2js on huge programs)? What options do I provide to the VM and how do I interpret the output.

rmacnak-google commented 6 years ago
--- a/runtime/vm/scavenger.cc
+++ b/runtime/vm/scavenger.cc
@@ -527,6 +527,7 @@ bool Scavenger::ShouldPerformIdleScavenge(int64_t deadline) {

 void Scavenger::IterateStoreBuffers(Isolate* isolate,
                                     ScavengerVisitor* visitor) {
+  TIMELINE_FUNCTION_GC_DURATION(Thread::Current(), "IterateStoreBuffers");
   // Iterating through the store buffers.
   // Grab the deduplication sets out of the isolate's consolidated store buffer.
   StoreBufferBlock* pending = isolate->store_buffer()->Blocks();

dart --observe --complete-timeline hello.dart Look for IterateStoreBuffers events in Observatory's timeline and see they dominate the time spent in their enclosing CollectNewGeneration events.

rakudrama commented 6 years ago

Ok, I tried that and in the middle of my 300s dart2js run, it shows most GCs are 60-80% IterateStoreBuffers, e.g. 10.5ms of 15.1ms with another GC in 195ms.

I can estimate dart2js spends 5% of its time (10.5/195) doing IterateStoreBuffers Is there a way to read off the total (rather than estimate)?

How can I find the large array? Things improve in the last 10% of the compile. Either we GC-ed the array or just stopped touching it.

rakudrama commented 6 years ago

@rmacnak-google How can I find the large arrays causing this problem?

rmacnak-google commented 6 years ago

It would be difficult to identify the source of an array from prints in the VM, so I'd do something like:

rakudrama commented 6 years ago

I looked into this again.

What would it take to improve this, e.g. card-mark large _Lists? It looks like it would speed up our large dart2js compilations by 5%

/cc @vsmenon

mraleph commented 5 years ago

The change actually seems to regress dart2js compile time by 3%. Any insight @rmacnak-google @rakudrama?

rakudrama commented 5 years ago

I don't notice any improvement on the bigger compiles either, which is where I would expect something with the huge arrays. acx_gallery is the best golem benchmark to look at.

rmacnak-google commented 5 years ago

Oh... 04941b5507666eabc1ac23027b124939a3794a03 accidentally reverts a fix in 80317f1c096308aca95b2fec98aa424ea08d81df, effectively disabling concurrent marking.

rmacnak-google commented 5 years ago

For CompileAcxGallery, Golem reports a 6.520% improvement with e15e8609aa8610f8c432f1caf2ab89358e2fce50, compared to a -3.533% regression from 04941b5507666eabc1ac23027b124939a3794a03, so card marking was worth the difference.