What are the road blocks to using wasm-split multiple times?

jnickg commented 7 months ago

I have a large web-based document editing application and am interested in splitting it using wasm-split. While discussing with my team possibilities for splitting up the module, the idea of a kind of "transitive split" came up. Basically it would be like the --merge-profiles options, except instead of taking the union of called functions, it takes the intersection of un-called functions and splits those to a third module, and/or the intersection of called functions and keeps those in a primary module (or maybe treats just one profile's "called" functions as the source of truth).

The behavior would look (illustratively) something like this:

When the primary module $M$ is loaded, it sees ALL deferred functions in ALL deferred modules $D_i$ in its imports section. If any imported function is called, it loads $D_1$
For any deferred module $D_i$, its import section lists ALL functions from ALL proceeding deferred modules in its imports section. If any of them are called, it loads $D_{i+1}$

Is such a thing theoretically possible? If so, what are existing road blocks to doing so? For example, if doing this through Emscripten would the JS plumbing need to change?

tlively commented 7 months ago

Yes, this is possible in principle. At the wasm-split level, there are two possible approaches to doing multi-module splits. First, you could just run wasm-split multiple times, passing explicit lists of functions so you have full control over the splitting. That is already possible today. Alternatively, wasm-split could be modified to be able to produce multiple modules in one invocation, which would allow it to automatically determine splits from profiles.

The fun part is encoding the dependence DAG between the split modules and ensuring they are loaded in the correct order, especially if your split ever produces the diamond problem (i.e. the dependence graph is not a tree).

If the dependence graph is a tree, then you can encode the dependence information in the splitting process itself. For example, consider the case where A is the primary module and it calls a function c provided by module C, where A -> B -> C is a path in the tree. If the original module M were first split into A and BC, then BC was split into B and C, then at runtime the call to c will first load B, since from the point of view of A, B provides c. Then the call to c from B will load C because from the point of view of B, C provides c.

If the dependence graph is a non-tree DAG, then there is no nice way I can see of encoding the dependencies directly in the splitting process, so the loader will need extra logic to resolve the dependencies itself.

Even in the case where the DAG is a tree, the loading logic would need to change to handle adding the exports of each loaded module to the cumulative import object passed to subsequent modules.

jbms commented 6 months ago

Not really related to this issue, but your response raised the following question for me: is there a reason that wasm-split links primary functions into secondary modules using imports and exports rather than using the indirect function table?

For my rust wasm splitting prototype I just used the indirect function table in all cases, which makes the JavaScript loading code very simple, but perhaps imports are optimized better?

Does V8 potentially optimize indirect calls with a constant index into direct calls if the table remains unchanged?

tlively commented 6 months ago

Yes, direct calls to imports are faster than indirect calls because the call target never changes and doesn’t need to be loaded. Unfortunately V8 cannot optimize indirect calls to direct calls in general because it is not possible for it to prove that the exported table will not be modified.

jbms commented 6 months ago

Yes, direct calls to imports are faster than indirect calls because the call target never changes and doesn’t need to be loaded. Unfortunately V8 cannot optimize indirect calls to direct calls in general because it is not possible for it to prove that the exported table will not be modified.

I'll rework my code to use imports when possible, then.

Is calling an import of an exported function just as fast as a direct function call within the same wasm module?

I can imagine that code splitting will become increasingly common if it can be made more convenient, but I don't know how likely it is that the (unavoidable) indirect calls would become performance bottlenecks.

tlively commented 6 months ago

There is probably a tiny amount of overhead to call an import compared to an intra-instance call because the instance pointer has to be updated, but it is far less than the overhead of an indirect call.

WebAssembly / binaryen

What are the road blocks to using wasm-split multiple times? #6521