dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.23k stars 1.57k forks source link

[dart2wasm] Less than 50% dispatch table size utilization in real apps #54349

Open osa1 opened 11 months ago

osa1 commented 11 months ago

The following patch prints dispatch table size + 2 stats about the unused space in the table:

--- a/pkg/dart2wasm/lib/dispatch_table.dart
+++ b/pkg/dart2wasm/lib/dispatch_table.dart
@@ -480,14 +480,30 @@ class DispatchTable {
   }

   void output() {
+    int gap = 0;
+    int unused = 0;
+
     for (int i = 0; i < _table.length; i++) {
       Reference? target = _table[i];
-      if (target != null) {
+      if (target == null) {
+        gap += 1;
+      } else {
         w.BaseFunction? fun = translator.functions.getExistingFunction(target);
-        if (fun != null) {
+        if (fun == null) {
+          unused += 1;
+        } else {
           wasmTable.setElement(i, fun);
         }
       }
     }
+
+    final tableSize = _table.length;
+    final gapPercent = (gap.toDouble() / tableSize.toDouble()) * 100.0;
+    final unusedPercent = (unused.toDouble() / tableSize.toDouble()) * 100.0;
+    final totalEmpty = gap + unused;
+    final totalEmptyPercent =
+        (totalEmpty.toDouble() / tableSize.toDouble()) * 100.0;
+    print("Dispatch table size: ${_table.length}, "
+        "gaps: $gap ($gapPercent%), unused: $unused ($unusedPercent%), gap+unused: $totalEmpty ($totalEmptyPercent%)");
   }
 }

Here "gap" is the space in the table after assigning selector IDs. "Unused" is extra space caused by functions that are preserved by the TFA but were never compiled by dart2wasm. Total is gap + unused.

When I compile DevTools and Flute I see these results: (output slightly edited for readability)

In both cases we have less than 50% utilization.

We should experiment with different heuristics/algorithms to assign selector ids to improve table size utilization.

osa1 commented 11 months ago

We've discussed this with @mraleph and @mkustermann. Some thoughts and ideas:

  1. We should build the dispatch table after codegen & patch code afterwards (object allocation code with cids, dispatch table calls with selector offsets). This way we save the "unused" space reported above by just not assigning offsets to selectors that are never called via dispatch table (i.e. maybe they're always inlined, or we had an entry-point pragma but didn't need to call it) and classes that are never allocated

  2. Abstract classes may need a class ID, but they don't need a column (or row?) in the dispatch table as no runtime object will have their class ID in its header.

  3. Similar to functions preserved by the TFA but never compiled, class IDs can also be lazily assigned for the cases where TFA preserves a class (for the same reasons) but we never allocate them.

  4. For methods that are inherited by a lot of classes but are rarely overridden, we can consider doing a range check on class ID, instead of a dispatch table lookup. For example, maybe Object.hashCode is rarely overridden. Instead of adding thousands of entries to Object.hashCode we may consider generating a dispatch function that checks class ID against some ranges of IDs and dispatches to a small number of hashCode implementations.

  5. When generating a virtual call we can compile only the members that can be called based on the static type of the receiver, instead of compiling all overrides of the member. (relevant code)

    (This should also allow more tree-shaking.)

For comparison, @mraleph printed similar stats in VM compiling a large real app and found out that it has 70% utilization. Among the ideas above VM currently does (2) and (3), and it uses a different heuristic when ordering selectors (dart2wasm, VM).

osa1 commented 8 months ago

ecb5fc2228a improved dispatch table utilization but there's still room for improvement. @mkustermann could you summarize what's left here? IIRC we still don't add entries to the table based on the type of the receiver in a call site (instead add all overrides of the called method to the table). Anything else?

mkustermann commented 8 months ago

Update:

We number now all concrete classes in strict depth-first pre-order numbering. That has improved the utilization of the dispatch table to almost 100% - meaning there's very little unused space in there.

We do build the dispatch table layout before doing the actual compilation of the app, but the actual function entries are filled in lazily (based on allocated classes & used selectors in dispatch table call). The fact that we reach almost 100% fill rate implies that all of them could theoretically be reached. The reason for this is that both the dispatch table building code as well as the call sites rely on the same consistent kernel metadata.

It may be that after binaryen runs, it devirtualizes some more call sites, theoretically making the entries in the table unused, but it's unlikely binaryen can prove that, so it would keep the entries.

So to obtain smaller table could