Open osa1 opened 11 months ago
We've discussed this with @mraleph and @mkustermann. Some thoughts and ideas:
We should build the dispatch table after codegen & patch code afterwards (object allocation code with cids, dispatch table calls with selector offsets). This way we save the "unused" space reported above by just not assigning offsets to selectors that are never called via dispatch table (i.e. maybe they're always inlined, or we had an entry-point
pragma but didn't need to call it) and classes that are never allocated
Abstract classes may need a class ID, but they don't need a column (or row?) in the dispatch table as no runtime object will have their class ID in its header.
Similar to functions preserved by the TFA but never compiled, class IDs can also be lazily assigned for the cases where TFA preserves a class (for the same reasons) but we never allocate them.
For methods that are inherited by a lot of classes but are rarely overridden, we can consider doing a range check on class ID, instead of a dispatch table lookup. For example, maybe Object.hashCode
is rarely overridden. Instead of adding thousands of entries to Object.hashCode
we may consider generating a dispatch function that checks class ID against some ranges of IDs and dispatches to a small number of hashCode
implementations.
When generating a virtual call we can compile only the members that can be called based on the static type of the receiver, instead of compiling all overrides of the member. (relevant code)
(This should also allow more tree-shaking.)
For comparison, @mraleph printed similar stats in VM compiling a large real app and found out that it has 70% utilization. Among the ideas above VM currently does (2) and (3), and it uses a different heuristic when ordering selectors (dart2wasm, VM).
ecb5fc2228a improved dispatch table utilization but there's still room for improvement. @mkustermann could you summarize what's left here? IIRC we still don't add entries to the table based on the type of the receiver in a call site (instead add all overrides of the called method to the table). Anything else?
Update:
We number now all concrete classes in strict depth-first pre-order numbering. That has improved the utilization of the dispatch table to almost 100% - meaning there's very little unused space in there.
We do build the dispatch table layout before doing the actual compilation of the app, but the actual function entries are filled in lazily (based on allocated classes & used selectors in dispatch table call). The fact that we reach almost 100% fill rate implies that all of them could theoretically be reached. The reason for this is that both the dispatch table building code as well as the call sites rely on the same consistent kernel metadata.
It may be that after binaryen runs, it devirtualizes some more call sites, theoretically making the entries in the table unused, but it's unlikely binaryen can prove that, so it would keep the entries.
So to obtain smaller table could
make dart2wasm be an actual optimizing compiler, meaning if we e.g. inline a function, we'd propagate types from the caller context around the inlined body, possibly devirtualizing more calls (than what TFA did), thereby making less dispatch table calls, thereby possibly using less selectors, making more entries in table null => If we did this, we could consider building the table layout after code gen & patch generated code with final selector offsets.
we could take callsite static type into account: Instead of enqueuing all implementations of a selector, only enqueue the ones we can actually hit by call site (e.g. Map.toString()
: only toString()
methods of concrete, allocated, subtypes of Map
need to be enqueued)
Identify very dense selector rows with many duplicated (i.e. inherited) methods. We could split those into several selector ids by having a small trampoline (e.g. imagine hashCode
/toString
/...: Make cid-based range checks against sub-hierarchies that override the those selelectors, call the split selector instead and otherwise the default one directly)
The following patch prints dispatch table size + 2 stats about the unused space in the table:
Here "gap" is the space in the table after assigning selector IDs. "Unused" is extra space caused by functions that are preserved by the TFA but were never compiled by dart2wasm. Total is gap + unused.
When I compile DevTools and Flute I see these results: (output slightly edited for readability)
In both cases we have less than 50% utilization.
We should experiment with different heuristics/algorithms to assign selector ids to improve table size utilization.