Closed rakudrama closed 5 years ago
This might be because the remembered set tracks objects instead of slots.
vm folks: can you provide instructions on how to determine if this issue actually affects my app (dart2js on huge programs)? What options do I provide to the VM and how do I interpret the output.
--- a/runtime/vm/scavenger.cc
+++ b/runtime/vm/scavenger.cc
@@ -527,6 +527,7 @@ bool Scavenger::ShouldPerformIdleScavenge(int64_t deadline) {
void Scavenger::IterateStoreBuffers(Isolate* isolate,
ScavengerVisitor* visitor) {
+ TIMELINE_FUNCTION_GC_DURATION(Thread::Current(), "IterateStoreBuffers");
// Iterating through the store buffers.
// Grab the deduplication sets out of the isolate's consolidated store buffer.
StoreBufferBlock* pending = isolate->store_buffer()->Blocks();
dart --observe --complete-timeline hello.dart Look for IterateStoreBuffers events in Observatory's timeline and see they dominate the time spent in their enclosing CollectNewGeneration events.
Ok, I tried that and in the middle of my 300s dart2js run, it shows most GCs are 60-80% IterateStoreBuffers, e.g. 10.5ms of 15.1ms with another GC in 195ms.
I can estimate dart2js spends 5% of its time (10.5/195) doing IterateStoreBuffers Is there a way to read off the total (rather than estimate)?
How can I find the large array? Things improve in the last 10% of the compile. Either we GC-ed the array or just stopped touching it.
@rmacnak-google How can I find the large arrays causing this problem?
It would be difficult to identify the source of an array from prints in the VM, so I'd do something like:
inspect
service event before the scavenge with the largest object in the store bufferI looked into this again.
What would it take to improve this, e.g. card-mark large _List
s? It looks like it would speed up our large dart2js compilations by 5%
/cc @vsmenon
The change actually seems to regress dart2js compile time by 3%. Any insight @rmacnak-google @rakudrama?
I don't notice any improvement on the bigger compiles either, which is where I would expect something with the huge arrays. acx_gallery is the best golem benchmark to look at.
Oh... 04941b5507666eabc1ac23027b124939a3794a03 accidentally reverts a fix in 80317f1c096308aca95b2fec98aa424ea08d81df, effectively disabling concurrent marking.
For CompileAcxGallery, Golem reports a 6.520% improvement with e15e8609aa8610f8c432f1caf2ab89358e2fce50, compared to a -3.533% regression from 04941b5507666eabc1ac23027b124939a3794a03, so card marking was worth the difference.
On my workstation, the following program runs in ~30s. If I reduce the list size by removing the
* 8
it takes ~8s.Now I don't know if this program is in any way realistic. How would I tell if my app behaviour has this behaviour? (i.e. the combination of large lists and mutation)
I instrumented the builtin hash map and hash set in runtime/lib to print list allocation sizes of 1M and over. dart2js compiles of a certain large app do create at least two hash maps with lists containing 2M elements. These are likely to be long-lived. (I thought of breaking the hash map into a multi-level hash map, but this is unlikely to help since there will be multiple smaller lists with random access that add up to 2M elements. A segmented approach would work better with lists that have exploitable access patterns, e.g. appending.)