[dart2wasm] Less than 50% dispatch table size utilization in real apps

osa1 commented 11 months ago

The following patch prints dispatch table size + 2 stats about the unused space in the table:

--- a/pkg/dart2wasm/lib/dispatch_table.dart
+++ b/pkg/dart2wasm/lib/dispatch_table.dart
@@ -480,14 +480,30 @@ class DispatchTable {
   }

   void output() {
+    int gap = 0;
+    int unused = 0;
+
     for (int i = 0; i < _table.length; i++) {
       Reference? target = _table[i];
-      if (target != null) {
+      if (target == null) {
+        gap += 1;
+      } else {
         w.BaseFunction? fun = translator.functions.getExistingFunction(target);
-        if (fun != null) {
+        if (fun == null) {
+          unused += 1;
+        } else {
           wasmTable.setElement(i, fun);
         }
       }
     }
+
+    final tableSize = _table.length;
+    final gapPercent = (gap.toDouble() / tableSize.toDouble()) * 100.0;
+    final unusedPercent = (unused.toDouble() / tableSize.toDouble()) * 100.0;
+    final totalEmpty = gap + unused;
+    final totalEmptyPercent =
+        (totalEmpty.toDouble() / tableSize.toDouble()) * 100.0;
+    print("Dispatch table size: ${_table.length}, "
+        "gaps: $gap ($gapPercent%), unused: $unused ($unusedPercent%), gap+unused: $totalEmpty ($totalEmptyPercent%)");
   }
 }

Here "gap" is the space in the table after assigning selector IDs. "Unused" is extra space caused by functions that are preserved by the TFA but were never compiled by dart2wasm. Total is gap + unused.

When I compile DevTools and Flute I see these results: (output slightly edited for readability)

DevTools: Dispatch table size: 115,185, gaps: 62,952 (54.6%), unused: 4,410 (3.8%), gap+unused: 67,362 (58.4%)
Flute (complex): Dispatch table size: 32,393, gaps: 17,161 (52.9%), unused: 1,475 (4.5%), gap+unused: 18,636 (57.5%)

In both cases we have less than 50% utilization.

We should experiment with different heuristics/algorithms to assign selector ids to improve table size utilization.

osa1 commented 11 months ago

We've discussed this with @mraleph and @mkustermann. Some thoughts and ideas:

We should build the dispatch table after codegen & patch code afterwards (object allocation code with cids, dispatch table calls with selector offsets). This way we save the "unused" space reported above by just not assigning offsets to selectors that are never called via dispatch table (i.e. maybe they're always inlined, or we had an entry-point pragma but didn't need to call it) and classes that are never allocated
Abstract classes may need a class ID, but they don't need a column (or row?) in the dispatch table as no runtime object will have their class ID in its header.
Similar to functions preserved by the TFA but never compiled, class IDs can also be lazily assigned for the cases where TFA preserves a class (for the same reasons) but we never allocate them.
For methods that are inherited by a lot of classes but are rarely overridden, we can consider doing a range check on class ID, instead of a dispatch table lookup. For example, maybe Object.hashCode is rarely overridden. Instead of adding thousands of entries to Object.hashCode we may consider generating a dispatch function that checks class ID against some ranges of IDs and dispatches to a small number of hashCode implementations.
When generating a virtual call we can compile only the members that can be called based on the static type of the receiver, instead of compiling all overrides of the member. (relevant code)

(This should also allow more tree-shaking.)

For comparison, @mraleph printed similar stats in VM compiling a large real app and found out that it has 70% utilization. Among the ideas above VM currently does (2) and (3), and it uses a different heuristic when ordering selectors (dart2wasm, VM).

osa1 commented 8 months ago

ecb5fc2228a improved dispatch table utilization but there's still room for improvement. @mkustermann could you summarize what's left here? IIRC we still don't add entries to the table based on the type of the receiver in a call site (instead add all overrides of the called method to the table). Anything else?

mkustermann commented 8 months ago

Update:

We number now all concrete classes in strict depth-first pre-order numbering. That has improved the utilization of the dispatch table to almost 100% - meaning there's very little unused space in there.

We do build the dispatch table layout before doing the actual compilation of the app, but the actual function entries are filled in lazily (based on allocated classes & used selectors in dispatch table call). The fact that we reach almost 100% fill rate implies that all of them could theoretically be reached. The reason for this is that both the dispatch table building code as well as the call sites rely on the same consistent kernel metadata.

It may be that after binaryen runs, it devirtualizes some more call sites, theoretically making the entries in the table unused, but it's unlikely binaryen can prove that, so it would keep the entries.

So to obtain smaller table could

make dart2wasm be an actual optimizing compiler, meaning if we e.g. inline a function, we'd propagate types from the caller context around the inlined body, possibly devirtualizing more calls (than what TFA did), thereby making less dispatch table calls, thereby possibly using less selectors, making more entries in table null => If we did this, we could consider building the table layout after code gen & patch generated code with final selector offsets.
we could take callsite static type into account: Instead of enqueuing all implementations of a selector, only enqueue the ones we can actually hit by call site (e.g. Map.toString(): only toString() methods of concrete, allocated, subtypes of Map need to be enqueued)
Identify very dense selector rows with many duplicated (i.e. inherited) methods. We could split those into several selector ids by having a small trampoline (e.g. imagine hashCode/toString/...: Make cid-based range checks against sub-hierarchies that override the those selelectors, call the split selector instead and otherwise the default one directly)

dart-lang / sdk

[dart2wasm] Less than 50% dispatch table size utilization in real apps #54349