WebAssembly / component-model

Repository for design and specification of the Component Model
Other
897 stars 75 forks source link

Linking components to reduce memory usage #354

Closed juntyr closed 1 month ago

juntyr commented 1 month ago

I am using WASM components on the web using the wasm_component_layer crate, which polyfills support by instantiating many WASM core modules with the multi-memory proposal.

In my application, I use a custom WASI implementation that is similar in functionality to wasi-virt to fully isolate the components I execute. Composing the "main" component with the WASI components is a beautiful solution as I can verify that the resulting component has no (WASI) imports.

The issue is that this approach translates into each instance of the composed component being run using many WASM core modules that all come with their own memory - in my application around 17 modules are required for one main component and one-component-per-wasi-package (all written in Rust and adapted to wasi-p2 using wasmtime's adapter). Since browsers use a lot of virtual memory to isolate even tiny WASM components, this very quickly gets out of hand and little actual memory usage hits the limit of how many WASM memories can be instantiated at a time.

In the long run, browsers may of course get native support for the component model and may be able to work around this limitation (though I also very much like wasm_component_layer's approach, which is adapted from jco, of translating the component model to core modules to maintain maximum compatibility).

What might be an interesting approach is to allow some sort of linking of WASM components that connects component model imports and exports like WASM compose would but could allow sharing the same memory (without exposing that to the application) and allow optimisers to optimise out some of the canonical ABI wrappers if both import and export are now linked together.

This approach would obviously be a trade-off since you would lose the isolation that components otherwise provide. However, providing the option may be beneficial in cases where someone wants the composability benefits of the component model but combine several small and trusted components to reduce the per-module overhead.

Is this something that could be implanted?

Thank you so much for your help!

lukewagner commented 1 month ago

First of all, it's really exciting to hear about wasm_component_layer; that seems like a great project!

Unfortunately, I don't think there's a generic/turn-key way to merge N components into 1 component other than preserving the separate memories (which you can certainly do inside 1 component, but I think it's the number-of-memories that is the issue here, not the number-of-components. The most basic reason is that the core wasm running in a component may embed absolute addresses of linear memory and function tables as i32.consts that a merging tool would have no way to distinguish (and relocate) from normal integral i32.consts. I think there's also other, more subtle ways core wasm would break if it goes from having its own linear memory to sharing linear memory with other unrelated core modules.

Instead, I'd suggest preemptively grouping implementation code together into a single component (noting that a single component is always allowed to export multiple interfaces). E.g., all of wasi:cli/imports might go into one component. There are also ways to get fancy and factor out common core modules that would otherwise be duplicated.

One other emerging proposal that could really help here is https://github.com/WebAssembly/custom-page-sizes. With this proposal, you could keep your component-per-WASI-package design but set the custom guard page size for all their memories to 1 (or whatever the smallest page size ends up being) which indirectly ends up turning off the guard-page optimization, causing the engine to simply malloc/realloc the various linear memories.

juntyr commented 1 month ago

First of all, it's really exciting to hear about wasm_component_layer; that seems like a great project!

I absolutely agree! I hope that projects like this (and it’s sibling crate wasm_runtime_layer which abstracts over runtimes) will at some point become community standards to ensure that some compatibility interface is maintained so that not just component implementors but also component embeddors can use WASM’s cross-platform, not-vendor-locked-in benefits :)

Unfortunately, I don't think there's a generic/turn-key way to merge N components into 1 component other than preserving the separate memories (which you can certainly do inside 1 component, but I think it's the number-of-memories that is the issue here, not the number-of-components. The most basic reason is that the core wasm running in a component may embed absolute addresses of linear memory and function tables as i32.consts that a merging tool would have no way to distinguish (and relocate) from normal integral i32.consts.

That makes a lot of sense and I had feared as much.

I vaguely remember that WASM module linking exists. Could there be some middle ground where you can still produce WASM modules that fulfil a WIT interface and then link those together. In a sense I would like the convenience of WIT composition (where dealing with imports and exports in a structured way is fantastic) e.g. with wac but with the benefits of linking.

Instead, I'd suggest preemptively grouping implementation code together into a single component (noting that a single component is always allowed to export multiple interfaces). E.g., all of wasi:cli/imports might go into one component. There are also ways to get fancy and factor out common core modules that would otherwise be duplicated.

(I could of course just do all the linking on the Rust side before things become a component, but in this case I also use the flexibility of composing at runtime, where being just in WASM land helps)

One other emerging proposal that could really help here is https://github.com/WebAssembly/custom-page-sizes. With this proposal, you could keep your component-per-WASI-package design but set the custom guard page size for all their memories to 1 (or whatever the smallest page size ends up being) which indirectly ends up turning off the guard-page optimization, causing the engine to simply malloc/realloc the various linear memories.

I first heard about that one a few weeks ago and absolutely agree that it would be a long-term solution as well. While I’m very much developing alongside whatever new features WASM releases, I feel like WASM support for custom page sizes might take a while to land on browsers and I’m hoping to find a solution for the meantime.

lukewagner commented 1 month ago

I vaguely remember that WASM module linking exists.

FWIW, the current component-model proposal subsumes the original module-linking proposal, allowing you to link any number of core modules together that share memory/tables (inside a single component). See this doc for a sketch. However, module-linking doesn't speak to the interesting questions (below) around ABI -- it assumes you already have one fixed (such as this one).

Could there be some middle ground where you can still produce WASM modules that fulfil a WIT interface and then link those together. In a sense I would like the convenience of WIT composition (where dealing with imports and exports in a structured way is fantastic) e.g. with wac but with the benefits of linking.

Yes, in theory this could be built purely as a feature of the producer toolchain, using WIT to imply a shared-everything core wasm ABI. There are a bunch of choices you have to make to determine how much of the Canonical ABI you want to reuse, though. Often folks bring up this idea because they want to avoid copying and "just pass a pointer", but to do this, you have to answer a bunch of hard questions around the lifetime management of this shared memory which will take some careful design work and a bunch of new bindings generation work (b/c the glue code and, in some languages, source-level bindings will need to be totally different). Or, you could keep the Canonical ABI the same, do the copying and get to reuse all the existing WIT bindings... but now folks wanting to just share a pointer are going to come knocking :) But I wouldn't be surprised if one or multiple projects to do this emerge over time.

(I could of course just do all the linking on the Rust side before things become a component, but in this case I also use the flexibility of composing at runtime, where being just in WASM land helps)

Agreed with the attractiveness of nice fully-modular design, but I think this might be a good idea in the short term. Independent of the vmem use for the guard pages, if we're talking about 17 components, you'll also probably get a decent RSS savings by sharing libc and other runtime memory that is otherwise duplicated and dirtied in each component instance. Of course we're trying to optimize this per-component baseline memory overhead over time, but that's often hard slow work and it will never get to zero. It's also worth noting that, even if we could reliably merge components to share a single memory (by fixing the abovementioned relocation issues), this overhead would remain because merging components to share the same libc/malloc/runtime adds a whole new set things that can blow up -- this is where you really need the producer toolchain to intentionally target a fixed shared-everything-linking ABI.