WebAssembly / component-model

Repository for design and specification of the Component Model
Other
914 stars 78 forks source link

Question: Linking components with Wasm modules #275

Open skuzmich opened 8 months ago

skuzmich commented 8 months ago

As far as I see, this proposal describes components interacting with each other and the host.

How would linking components with a non-component Wasm module look like? Could we use some sort of "relaxed component format", where some of the imports/exports use component types while others use core WebAssembly?

For example: a browser app that shares memory and references with JS and DOM might want to use a component to implement some of its functionality.

alexcrichton commented 8 months ago

If I understand your question correctly I believe this is more-or-less the intent of the canonical ABI. That provides a translation effectively between component model types and core WebAssembly. It currently is specified in the context of a component however so it's not possible yet to translate to core wasm in a vacuum (e.g. the canonical options all need to be specified in theory).

That being said though if you haven't seen it already jco is a project to run components on the web.

skuzmich commented 8 months ago

True, I see how we could use the same ABI, given that options are specified. I would be great if we could reuse component tooling for this case of "impure" components, the ones with some of their core imports and exports sticking out.

lukewagner commented 8 months ago

Part of what enables all the cross-language/language-agnostic tooling that makes the component model valuable in the first place is that there is no raw shared memory between a component and the outside world: this is what enables writing an interface once (in WIT) and generating bindings in N languages and a bunch of other scenarios as well. With raw public memory, there is no ABI and thus everyone would have to revert to what everyone does in wasm today, which is to roll your own custom thing and produce host-dependent code, which is the state of affairs that the Component Model is seeking to provide an alternative to.

Now, if your goal is to "mostly" reuse the Component Model, but do something browser-/JS-dependent "on the side" (e.g., by assuming certain special JS glue code), you can always run jco transpile and then manually edit the generated JS glue code to expose the otherwise-encapsulated ArrayBuffer that aliases core linear memory and then you can go to town. But that would very much be outside the Component Model and so all sorts of general ecosystem tooling (e.g., a lot of the composition and virtualization tooling that we're building) would simply break and thus these "impure" components would effectively be outside the ecosystem. But maybe that's fine for your use case.

My hope though is that with resources/handles in Preview 2 and futures/streams in Preview 3, the set of use cases that actually need raw publicly-shared memory would go away and the glue code you'd get from jco transpile (or, eventually, a native implementation) would be efficient enough. And if not, then I'd like to understand the concrete use case and design a new Component Model feature (say for a Preview 4 or v.1.1) that achieves the desired performance while maintaining the shared-nothing model that the ecosystem tooling depends on. This approach lifts up the whole ecosystem.

alexcrichton commented 8 months ago

Ah I see I think I misunderstood the question and it's instead sound like you want to have a wasm module that imports both component things and non-component things, and for that I would agree with what @lukewagner said.

skuzmich commented 8 months ago

@lukewagner, I like shared-nothing model! To me, it's a different design tradeoff that makes Wasm modules advantageous over components in some cases. Wasm GC was almost exclusively designed to accommodate efficient interop with JS and browser, but the component model chooses to disallow it. Should I propose adding a cross-component GC with cycle collection for Preview 4? :)

In the meantime, while some Wasm modules can't be components, they are a natural fit for using other components. A standard format for Canonical ABI and Post-MVP adapter functions applied to a subset of imports would help. If I remember it correctly, the interface types proposal allowed something like this.

lukewagner commented 8 months ago

Agreed that, when compiling a language to wasm with custom bindings tailored to the JS/browser environment, core wasm can still be the best fit in some cases; in such scenarios, the cross-language and host-agnostic benefits of the Component Model aren't relevant so that's not really a loss.

Based on the smiley, I think you're kidding about the cross-component-GC point but, just to be explicit on that topic: in a Component Model context where we're trying to keep the choice of programming language an internal impl choice of the component, it's crucial for components to never rely on cross-component GC, since this rules out a number of languages (including languages with GC that, for various reasons, are implemented in terms of linear memory). This is why the Component Model supports explicit acyclic ownership, which supports both wasm-gc (treating a handle as a root) and linear memory languages to interop. But given this acyclic ownership, I think components should still be able to do a decent job pointing at JS or DOM objects (via handle) and calling methods (noting that wasm-GC also requires opaque references and calling imports to access JS/DOM objects).

Incidentally, collecting cycles between wasm and JS/DOM is only one of the motivations for wasm-gc: another big one (that applies just as well in a Component Model context) is factoring the GC algorithm out of wasm and into the host (which can do a better job by being specialized to the environment). Here, we just need to add new Canonical ABI options for lifting and lowering to/from wasm-gc types which is very much in scope for a future Preview release.

skuzmich commented 8 months ago

Based on the smiley, I think you're kidding about the cross-component-GC point

Not quite, I was just glad you asked about use cases. :) Shared GC makes for a simpler programming model: you can exchange objects and closures freely between WasmGC language and JavaScript and they follow the same memory management rules as if they were your own. Lack this could become an obstacle preventing our toolchain from producing components when targeting JS hosts.

it's crucial for components to never rely on cross-component GC, since this rules out a number of languages

Could you elaborate on how GC would rule out linear memory languages? I could imagine them putting references into tables and using indices, similar to to how they currently handle externrefs. They would also want to have GC finalisers (Post-MVP) to free associated linear memory.

rossberg commented 8 months ago

@skuzmich:

Wasm GC was almost exclusively designed to accommodate efficient interop with JS and browser

I'm curious what gave you that idea, because that was at most a secondary or tertiary goal.

skuzmich commented 8 months ago

@rossberg I agree that it achieves other goals. But Wasm-host cycle collection was the main argument I heard used in favour of current design, as opposed to solving inefficient parts of GC in linear memory.

lukewagner commented 8 months ago

Could you elaborate on how GC would rule out linear memory languages? I could imagine them putting references into tables and using indices, similar to to how they currently handle externrefs.

The problem manifests when you have cycles that are successfully collected when all the components involved are implemented via wasm-gc but leak (due to the cycles being rooted by the tables) when some of the components involved are implemented via linear-memory+tables. What makes this failure mode especially insidious is that it won't show up early in design and testing; only when graphs of components are used at scale in certain compositions. And once you have one of these leaks, solving it is really hard, because now you need to reason about cross-language cycles. Closures are a particularly easy way for these cycles to appear since they close over their environment, which usually entrains containing scopes that often reference (via local or global variable) the foreign object to which the callback is being attached, creating a leak that's not obviously anyone's "fault" (which is the worst kind of leak). IIUC, Firefox 2 got the bad reputation for being leaky due to exactly this sort of problem, and the problem was only solved by adding a whole cycle collector (and cycle debugger) to FF along with a gnarly set of macros that all C++ classes that can possibly participate in a cycle have to use carefully. (Chrome also went through a number of iterations before settling on, iirc, having their C++ objects owned by the V8 GC and I think Safari does a fixpoint iteration sort of thing that also requires explicit participation from C++.) And this browser example is actually a nicer scenario than we have with components because there are only 2 "components" written in 2 fixed languages with tight manual integration in a single codebase.

Shared GC makes for a simpler programming model: you can exchange objects and closures freely between WasmGC language and JavaScript and they follow the same memory management rules as if they were your own.

Agreed on this point, though: if you only care about GC languages using wasm-gc as a substrate, it does open up new options. I could imagine a wholly different "GC-centric Component Model" that had a rather different technical design, and maybe backed off the whole "shared-nothing" model to allow sharing mutable (GC) memory and perhaps fixed some wasm-gc representations of common language concepts (like classes, vtables, etc). But now this sounds a lot more of a JVM- or CLR-like "language family", which isn't a bad thing and could be independently valuable and I expect someone will actually do at some point in time; it's just a very different goal statement than we have with the Component Model.

skuzmich commented 8 months ago

Thanks a bunch for breaking that down for me! Now I totally get why there's no cross-component garbage collection.

skuzmich commented 8 months ago

Sorry for derailing this into a GC discussion. Back to original question.

If we agree that component model will not be the best solution for all Wasm use cases, could we also agree that there should be a standard bridge mechanism between existing Wasm and emerging components for them to be interoperable in environments where both are supported? Since the limited ABI is the defining part of component model and it can't be changed, we are left with the option of using this ABI in regular Wasm for this kind of bridge.

skuzmich commented 8 months ago

I somehow missed the (exact?) thing I'm talking about proposed for post-MVP: adapter modules. Shouldn't we ensure that if this proposal receives first-class JS interop support in its MVP, it also gains equivalent first-class Wasm interop support at the MVP stage?

lukewagner commented 8 months ago

Oh hah, right; I totally forgot about writing that way way back. In the short term, given that components can't do anything in browsers that you can't do with core wasm + JS API and that you're mostly focused on doing special browser things with JS, I'd go with the route suggested above, which is to modify jco transpile output and use that as a way to prototype and flesh out the use case and expected developer experience for adding adapter modules in a post-MVP timeframe.

skuzmich commented 8 months ago

... given that components can't do anything in browsers that you can't do with core wasm + JS API ...

While it's true that wasm as a whole can be polyfilled in JS as far as I recall, the component model, succeeding WebIDL bindings and interface types, offers static type information, allowing engines to better optimize built-in calls.

This is mentioned in JS API:

When an imported JavaScript function is a built-in function wrapping a Web IDL function, the specified behavior should allow the intermediate JavaScript call to be optimized away when the types are sufficiently compatible ...

It feels unfair that MVP gives more preference to components in this situation.

lukewagner commented 8 months ago

Although it is definitely the goal for browsers to natively implement the C-M and perform these optimizations, that seems to be at least a year or two out (before browser implementations start) in the best case (given higher core wasm priorities such as shared-everything-threads and stack-switching, which are large projects), so at least for the next couple of years, I think there shouldn't be a performance advantage. But if that starts looking imminent and there remain concrete use cases for adapter modules, then it would make sense to adding them to the proposal at that point. But in the meantime, it would just add an extra dimension of complexity for the ecosystem to worry about.

skuzmich commented 8 months ago

Thanks for the ballpark web implementation timeframe, I didn't know that! Now it is clear that there is little practical need in adapter-modules until a browser (or some other "core + component" system) would decide to implement C-M natively (edit: or until there will be warg with a lot of useful components that people with non-component wasm apps will want to use :) ) .

Just out of curiosity, were adapter-modules envisioned as a layer below components, or with a different structure in mind?

lukewagner commented 8 months ago

That's an interesting second use case I hadn't considered for adapter modules (reusing a component from an existing core application); thanks for pointing it out!

Just out of curiosity, were adapter-modules envisioned as a layer below components, or with a different structure in mind?

I haven't thought too carefully about it, but I guess that adapter modules could be part of the Component Model and reuse all the same new definition sorts, but in the preamble, have a different value in the layer field to distinguish components from adapter modules, affecting validation and execution.

skuzmich commented 8 months ago

That's an interesting second use case I hadn't considered for adapter modules (reusing a component from an existing core application); thanks for pointing it out!

This is what I had in mind in the example from the original question.

Initial developer experience could look roughly like this:

  1. Finds a useful component.
  2. Generates bindings using wit-bindgen.
  3. Calls to bindings from the app.
  4. Compiles the app to .wasm file.
  5. Creates adapter-module using wasm-tools adapter-module new.
  6. Composes adapter-module and component using wasm-tools compose.
  7. Transpiles composed adapter-module to ES module by running jco transpile or polyfills WebAssembly.{compile,instantiate,etc}.
skuzmich commented 8 months ago

And yes, I just realised that WebIDL bindings is a separate use case, that could be postponed till the native browser implementation. It seems that it would also require an extension to component types (at least an externref) to fill the gaps in WIT <-> WebIDL mapping. And ABI for things like list<externref> is indeed out of C-M scope.

lukewagner commented 8 months ago

Initial developer experience could look roughly like this: [...]

I think what you sketched could technically work, but I think it might be constraining for existing core wasm toolchains that exist today (which often use a mix of .wasm and JS API to do custom linking). Instead, I think you could get away with taking the component from step 1 and wrapping it with a thin adapter module that imports the "main" linear memory and lowers all the component's exports using this memory, exporting these lowered functions. That way you could link these lowered core function into the rest of the .wasm app using whatever existing custom (JS API) linking scheme is already being used. But given how restricted and simplistic this adaptation scheme is, you could also imagine that we just define a new JS API function (say, WebAssembly.lowerComponent()) that does the same thing. Given that this doesn't change the C-M at all, I could even imagine including this in the initial JS API proposal.

It seems that it would also require an extension to component types (at least an externref)

I think we should be able to use resource types and handles for such cases, where the component imports a single resource type representing any JS value, thereby allowing any handle for this resource type (used in imported function params/results) to hold any JS value.

skuzmich commented 8 months ago

I could be understanding it wrong, as I'm new to the spec and haven't internalized it yet. C-M is quite a lot of awesome work! I find myself guilty of still looking at it through the lens of its wider-scope predecessor, causing the list<externref> talk :)

I think it might be constraining for existing core wasm toolchains that exist today

As a developer of one of the "current" toolchain that hasn't implemented C-M yet, I'm trying to understand how the workflow you suggested makes things easier for us compared to working with adapter-modules. We currently support two targets: "JS" for browser apps, and "WASI Preview 1" which we hope to migrate to C-M "WASI Preview 2" when prototyping tools catch up with core wasm features. It would be ideal for us to provide toolchain users with a uniform way of using external components in both targets.

Please help me understand, does your workflow, involving component lowering, mean wiring the "main" memory + realloc + post-return through the core wasm imports/export as opposed to declaring them in C-M canonopt section, or is there more to it? If I understood it correctly, wouldn't it effectively mean us doing the same thing in both cases but in a different way?

I think I like your lowerComponent() idea the way I understood it, applied to adapter-modules with only core imports/exports left (I think you assumed it to be the case for C-M imports in your example?). In this case lowerComponent(step_6_output()) would be just a core wasm module with the exact step_0 signature, effectively a drop-in replacement with no JS API changes at all!

skuzmich commented 8 months ago

Also, while I can't myself evaluate the practical value of this separate use case, a combination of adapter-modules + lowerComonent could provide a standard way for us to use C-M not just in browser Wasm, but in any other core-wasm-API runtime in a similar fashion to current ad-hoc WASI Preview 1 <-> Preview 2 bridge.

lukewagner commented 7 months ago

Thanks for all the continued time and consideration talking through components and your toolchain, and I appreciate the kind words!

Please help me understand, does your workflow, involving component lowering, mean wiring the "main" memory + realloc + post-return through the core wasm imports/export as opposed to declaring them in C-M canonopt section, or is there more to it?

So let's say I start with a component C that I want to reuse from my JS + core wasm application. Internally, instances of C will contain 0..N core memories and module instances that are encapsulated by C. If I lower C by calling this hypothetical WebAssembly.lowerComponent(C), I'm imagining that the result would be a WebAssembly.Module which:

  1. imports a core memory and realloc function (which are necessarily distinct from those encapsulated by C)
  2. imports pairs of functions that are lifted to produce each import of C (the second function being optional post-return)
  3. exports pairs of functions that are produced by lowering each export of C (the second function being post-return)

Thus, this lowered WA.Module could be instantiated and linked with the rest of the JS+wasm application using the JS API. It's sortof as-if the outer JS+wasm application was linked to C using an outer adapter module (as you initially suggested), but instead we did the analogous linking using the JS API (which is super dynamic/flexible).

skuzmich commented 7 months ago

Thanks, your explanation really helped clear things up.

I'm thinking we might need to give WebAssembly.lowerComponent a string-encoding, possibly one for each function, right?

I'm trying to understand what work toolchains might want to do to fill the gaps in lowered components. We should probably develop tools for creating JavaScript bindings for these lowered components. And maybe consider a lowered equivalent of the jco preview2-shim when WASI is imported?

lukewagner commented 7 months ago

I'm thinking we might need to give WebAssembly.lowerComponent a string-encoding, possibly one for each function, right?

Oh right, building on what I said above, you could imagine that there is a second optional argument WebAssembly.lowerComponent(C, canonOpts) that let you override the default options (e.g., allowing you to pass in { stringEncoding: "utf16" }). These overrides would be set for every lift and lower performed by lowerComponent (instead of doing it on a per-function basis like the C-M does), but I expect that's good enough for most purposes.

We should probably develop tools for creating JavaScript bindings for these lowered components.

I might be misunderstanding your meaning, but if you want JS bindings for a component, then the ones you get directly from jco's ESM-integration for a component will give you something high-level and quite usable.

skuzmich commented 7 months ago

I might be misunderstanding your meaning, but if you want JS bindings for a component, then the ones you get directly from jco's ESM-integration for a component will give you something high-level and quite usable.

I'll try to elaborate on the use case. The existing core wasm + JS application decides to use a component. It imports some of the WebAssembly.lowerComponent(C, canonOpts) exports (along with the dance around lowered canonical ABI described above). This component C is not pure, it needs to access the system. It imports WASI or some other interface with a (possibly polyfilled) browser implementation. In other words, there will be some (non-lowered) JS API that provides these imports. It would be natural to reuse it this use case. We would need to somehow bridge this component-level JS API with lowered core wasm JS API required in WebAssembly.lowerComponent(C, canonOpts) imports. If jco could help with this it would be great!

lukewagner commented 7 months ago

Ah hah, thanks for explaining. Yes, that makes sense. So I guess then, to expand the options of this hypothetical API one more time, perhaps the arguments are WebAssembly.lowerComponent(C, which, canonOpts), where the which parameter would let you configure what was lowered and what was kept component-level. So you could imagine possible arguments including "all", "exports", "imports", or { imports: [ "a", "b" ], exports: [ "c", "d" } }, where the last one let you hand-pick exactly what you wanted to lower. For any import or export not selected by which, the import/export would retain its component-level type and thus the same automatic high-level bindings.

Thinking more about how polyfilling would work, though, I realized that these extra options would make it harder to polyfill at build-time unless you know a priori what options will be passed at runtime to WebAssembly.lowerComponent(). Instead, it seems like what you'd want is to perform all the lowering AOT by passing all the lowerComponent() options to jco transpile, which could then AOT generate an all-in-one core .wasm along with a .js file containing glue code for adapting the un-lowered imports and exports. These artifacts could then be used by the existing core wasm + JS application in place of the native WebAssembly.lowerComponent() sketched above. With some effort, you could probably even start building this functionality today!

And to reiterate, what's cool about this whole approach (as opposed to the more-general "adapter modules" idea), is that it doesn't change the meaning of "component"; these components being "lowered" are still shared-nothing, just transformed into a more-digestible-from-today's-core-wasm format.

skuzmich commented 7 months ago

I see how this would cover the use case I'm describing. AOT solution with jco looks nice!

I initially thought about a more general non-standard pure AOT tooling convention: An adapter-module that can be transpiled to:

This also wouldn't change the meaning of standard components, but would be a lot more composable, and, potentially, have a path to a standard native implementation. It is hard for me to tell if this would involve a lot more work compared to a more-limited approach that also introduces different new concepts, like exposing lowered ABI, but "backwards".

lukewagner commented 7 months ago

(Hello in the New Year!) Yes, that makes sense, I could imagine a variety of toolchain conventions and output forms like that, depending on the goals of where you want to run and what you want to link to. As long as we can keep the compilation target for browser-agnostic code being a shared-nothing component (so that we can reuse all the browser-agnostic producer toolchains), I think we have a bunch of options for what to transform that component into (all of it conceptually being just a "host implementation detail" from the POV of the component producer).