WebAssembly / component-model

Repository for design and specification of the Component Model
Other
947 stars 79 forks source link

[Documentation] static core linking #239

Closed penzn closed 3 months ago

penzn commented 1 year ago

@pchickey @lukewagner, we had a discussion of static linking in context of WebAssembly/WASI#549, and I just wanted to make sure I understood it correctly:

Currently static core linking works by shipping static libs or wasm 'object files' in the component binary and then running wasm-ld in the runtime during component instantiation.

Is this an accurate description? Can you please point me to where this is currently documented?

pchickey commented 1 year ago

wasm-ld is able to statically link wasm object files and archive files (static libs) into a core wasm module.

You can then transform a core wasm module into a component using a tool such as wit-component.

All of these things happen ahead of time, before the component is loaded into a runtime for instantiation.

penzn commented 1 year ago

So what is the relationship between wasm-ld and wit-component? I thought you mentioned that the latter uses the former or something like that. Let me know if there is already a place to read about it, so I don't bug you with questions already answered elsewhere :)

alexcrichton commented 1 year ago

I'm not aware of any specific documentation myself, but these tools are quite low-level so it's sort of similar to asking what the relationship is between clang and ld in a sense. It's probably best to understand the tools from an inputs/outputs perspective where:

The wit-component support is expoed through the wasm-tools component new subcomand (in this repo) and you can explore that with wasm-tools component new -h. For wasm-ld you can also explore with wasm-ld -h and binaries I know are available from the wasi-sdk repository but if you already have LLD installed you may already have wasm-ld installed as well.

penzn commented 1 year ago

@pchickey made a reference to wasm-ld being used by either a wit tool or runtime, and I am trying to follow up on the mechanics of that.

I know what wasm-ld does and what wasm 'object files' really are.

pchickey commented 1 year ago

wasm-ld is not used by any of the wit tools or runtimes. On our call last night I described the process by which which you can statically link programs prior to getting involved with any component runtime.

If you desire a component which contains a single statically linked core module, you should create that core module using whatever wasm-object file (meaning, in almost all cases, llvm) based toolchain, and use wasm-ld to link all of the objects in that program together (statically). None of what I described is specific to the component model at all - this is the mechanism through which the overwhelming majority of wasm modules are linked today. The llvm team documented the object file format here: https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md

You can then take that core module and transform it into a component by using wit-component to add the component binary format type information.

At this time there is no mechanism to take already-existing components (meaning, any set of components containing more than one core module) and change their representation into a single core module using a single linear memory. If you wanted to create such a thing, merging the core modules into a single linear memory would have to implement its own page table / virtual addressing scheme internal to wasm, which would be slower than letting a wasm runtime do that translation on your behalf to implement multiple linear memories, since the runtime can leverage actual virtual memory and an internal-to-wasm implementation cannot.

penzn commented 1 year ago

OK, I guess I completely misunderstood that. So if the linking is going to be done the same way it is done today, does it interface with component model, i.e. does component model encode any of the core module composition? There is this example in mvp/Explainer.md:

(component
  (core module $A
    (func (export "one") (result i32) (i32.const 1))
  )
  (core module $B
    (func (import "a" "one") (result i32))
  )
  (core instance $a (instantiate $A))
  (core instance $b (instantiate $B (with "a" (instance $a))))
)

This would not mean that $a is linked into $b, right? Also, when we have a library that has to be statically linked for performance reasons, would that be done entirely outside of component model?

lukewagner commented 1 year ago

@penzn That example is admittedly a bit synthetic; if you were able to generate that component, you might as well have statically linked $A and $B together into a single core module (e.g., using wasm-ld), producing just:

(component
  (core module $AB
    ... $A and $B linked together via wasm-ld
  )
  (core instance $ab (instantiate $AB))
)

This simple example doesn't have any imports/exports, but in a more realistic example, the core module produced by wasm-ld would contain imports for all symbols annotated with import_name or export_name, and then these imports/exports would be wired up via lower/lift into component-level imports/exports.

That being said, if you did leave $A and $B as separate core modules in the original example, that would be like load-time dynamic linking in a native setting: separate compiled objects produced by wasm-ld, linked together at runtime. As another variation, if you imported $A (instead of nesting it), then that starts to look like the shared-everything dynamic linking example, for sharing $A code with multiple components.

penzn commented 1 year ago

I see, how this integrates with components is similar to how we currently use 'external' functions, with the linking staying pretty much the same (difference is in how the interfaces are really produced). Can we add this to the docs, maybe changing this particular example? I can try to help.

lukewagner commented 1 year ago

Yes, that's a good idea; it seems like how static linking fits into the component model is a generally pretty hot topic these days :) Maybe the best way would be a new StaticLinking.md example (linked to from within the main explainer in the relevant points like the example you're looking at) so that this StaticLinking.md can better introduce the problem from scratch and explain how it all works. That'd be awesome to have your help writing it up, otherwise I'm happy to take a crack at it some time after WasmCon.

lukewagner commented 1 year ago

I started this diagram to try to capture the different phases of linking that we've been talking about and how they fit together in the overall developer workflow. I'm imagining a new examples/StagesOfLinking.md that embeds this diagram and then talks through how it works (analogous to the existing examples/SharedEverythingDynamicLinking.md). @penzn does that look/sound right to you?

penzn commented 1 year ago

@lukewagner that looks right, thank you for putting it together! I think this diagram is great, though I think there is another angle that can be useful (this might need a diagram, maybe just a note in the text), specifically how would this work from 'using a component' point of view. Say I have a WIT file, implementation of that component, and a consumer of that component - I can compile both the implementation and consumer into a bunch of .o/.a files, but would I be able to link them together, or would that be tricky due to the need to maintain memory boundaries?

lukewagner commented 1 year ago

@penzn Using core wasm multi-memory (which it looks like is in both Firefox and Chrome now), it's always possible to take an arbitrary component and "fuse" it into a single core wasm module that imports/exports the Canonical ABI of the original WIT. Is that what you're asking about, or something else?

penzn commented 1 year ago

I am just trying to understand where the boundaries are, and looks like this picture is all that is needed to describe the main flow anyway, multi-memory case can live in a separate picture, though it is possible to add to this one too, I suppose. There is not a tool for this multi-memory fusing yet anyway, right?

lukewagner commented 1 year ago

Gotcha; I suppose I could add a final branch to the existing flow, showing that you could either run directly on a component-enabled runtime or fuse to run on a pure-core-wasm runtime that speaks the CABI.

There is not a tool for this multi-memory fusing yet anyway, right?

Correct. @alexcrichton should check me on this but, iirc, wasmtime's compilation of components internally fuses components into core wasm, compiling component-level concepts like lift and lower into core wasm adapter functions, but that ultimately generates multiple core modules linked together by an engine-internal generated linker script. Given a set of core modules, it shouldn't be hard to perform a further fusion to produce a single multi-memory core module, although, in general, this requires duplicating any core module that is instantiated more than once (since, without any module-linking functionality in core wasm, a given core function can only access a single memory).

alexcrichton commented 1 year ago

ultimately generates multiple core modules linked together by an engine-internal generated linker script

This is correct, more-or-less. There's engine-specific stuff not represented in core wasm such as resource tables some other state like MAY_{ENTER,LEAVE}. This can all theoretically become core wasm, however, but it was easier to put in the engine currently.

but would I be able to link them together, or would that be tricky due to the need to maintain memory boundaries?

To add my own thinking to this, there's no tool that does this today and I don't think that this would be an easy lift. That being said I also don't think it would necessarily be overly difficult either. The concerns here that make this difficult are:

AFAIK no one's planning such tooling like this just yet too.

lukewagner commented 1 year ago

This intermediate format would leave out configuration of host-visible lifts/lowers. For example if you lift a core function into a component function and then export that the host still has to interpret the type of that to know how to call it.

Just to check my assumption here: if you have a fixed world in the host, then I think that implies fixed core function signatures for all imports/exports, so the only extra bits of metadata the runtime needs are the canonopts bits associated with each lift/lower in the component, which would have to be communicated alongside the core module somehow/where. But if you can do that, then can the host implement the world via plain core function imports/exports (respecting those canonopt bits as part of the dynamic implementation)?

lum1n0us commented 1 year ago

@lukewagner I got a questions about the diagram. There are two blocks named "wit-component". The above one "combined" two core modules together and output one component. It looks like core module $A and core module $B in early discussion. Does it mean wit-component is able to(or will be able to) "link" two(or multiple) core modules ahead of time?

alexcrichton commented 1 year ago

But if you can do that, then can the host implement the world via plain core function imports/exports (respecting those canonopt bits as part of the dynamic implementation)?

Indeed!

Does it mean wit-component is able to(or will be able to) "link" two(or multiple) core modules ahead of time?

I'll snipe this question from Luke while I'm here; wit-component does support linking multiple core modules together which are intended to be dynamically linked to one another. This produces a single component output which is akin to bundling a bunch of DLLs into one executable sort of. Outside of the dynamic linking there, however, wit-component only takes a single core module and produces an output component. The output component will still likely have multiple core modules but they're all synthesized and small except for the main one provided.

lukewagner commented 1 year ago

Thanks Alex! Just to append to that fine answer, regarding the question

Does it mean wit-component is able to(or will be able to) "link" two(or multiple) core modules ahead of time?

since there are a variety of possible interpretations of "link ... ahead of time", I'd like to expand to say that the component binary produced by wit-component will contain the inputs as separate core modules (either bundling the core module bytes inline or importing the core modules from a common registry, although this second option isn't implemented yet). The way these core modules are linked together is known statically in a way that allows converting cross-module indirect import calls into direct or even inlined calls in the generated machine code. However, because the core modules are separate, engines/runtimes are free to share the compiled representations of common core modules (either using content-hashing of the core modules or via the common registry being imported from).

lum1n0us commented 1 year ago

wit-component does support linking multiple core modules together which are intended to be dynamically linked to one another

Does it mean this PR #1133 ?

lukewagner commented 1 year ago

@lum1n0us Yes, that is a new part of the dynamic linking story: IIRC, that PR emulates dlopen()-style runtime dynamic linking (while statically bundling all possibly-dlopen()ed modules inline into the component). However, it's also possible to perform load-time dynamic linking in the style of ld.so as described in this example; I forget whether wit-component supports that yet, though.

lukewagner commented 3 months ago

Heh, it took a while, but I finally got back to this documentation request. See #367