WebAssembly / tool-conventions

Conventions supporting interoperatibility between tools working with WebAssembly.
Artistic License 2.0
297 stars 65 forks source link

Add a toolchain-independent ABI document, and propose `_initialize` #203

Open sunfishcode opened 1 year ago

sunfishcode commented 1 year ago

The Wasm ecosystem is currently not consistent in how "constructors" such as C++ static initializers and similar features in other languages are implemented, and the result is users reporting constructs running multiple times, and other users reporting constructors not getting run when they should.

WASI has defined a convention using an exported function named _initialize, however not all users are using WASI conventions. In particular, users of what is sometimes called "wasm32-unknown-unknown" are not expecting to follow WASI conventions. However, they still have a need for constructors working in a reliable way.

To address this, I propose moving this out of WASI and defining this as a toolchain-independent ABI, here in tool-conventions. This would recognize the _initialize function as the toolchain-independent way to ensure that constructors are properly called before other exports are accessed.

Related activities

In the component model, there is a proposal to add a second initialization phase. If that's done, then component-model toolchains could arrange for this _initialize function to be called automatically by this second initialization mechanism.

Considered alternatives

It is tempting to use the Wasm start function for C++ constructors; this has been extensively discussed, and the short answer is, the Wasm start function is often called at a time when the outside environment can't access the module's exports, and C++ constructors can run arbitrary user code which may generate calls to things that need to access the module's exports.

It's also tempting to propose defining a second initialization phase in core Wasm. I'm not opposed to this, but it is more complex at the core Wasm level than at the component-model level. For example, in Emscripten, Wasm modules depend on JS code being able to run after the exports are available but before the initialization function is called, which wouldn't be possible if we simply call the initilaization function as part of the linking step.

Wasm-ld has a __wasm_call_ctors function, and in theory we could use that name instead of _initialize, but wasm-ld already does insert some initialization in addition to just constructors, so I think it makes sense to use _initialize as the exported function, which may call __wasm_call_ctors in its body.

Process

We don't have a formal process defined for tool-convention proposals, but because this is proposal has potentially wide-ranging impacts, I propose to follow the following process:

ricochet commented 1 year ago

I am in favor of this proposal. As a separate process question, is there a list of producers that need to be updated? It may help to communicate early with those toolchains. It might also help if this proposal encourages toolchains to link back to this PR in their issue/PR to make it easier to determine when support is added.

abrown commented 1 year ago

Great issue description! That background content is almost worth more than the actual document.

sunfishcode commented 1 year ago

@ricochet I think many languages have a potential use for this. But in terms of things that need to be updated, I think the bigger coordination concern here is with embedders. In some cases we'll need engines to call _initialize automatically, and in other cases, we'll need users to call _initialize themselves.

@sbc100 BasicModuleABI works for me. Renamed.

sunfishcode commented 1 year ago

I've incorporated various bits of feedback, and the discussion here now seems to have settled. Following the process I described above, I've now filed an agenda item to discuss this with the CG: https://github.com/WebAssembly/meetings/pull/1253

titzer commented 1 year ago

What if the toolchain injected a check for initialization status into exports? AFAICT that would make it impossible to misuse a module by not calling its initialization, as that would be done lazily. The code would be roughly:

    (if (global.get $uninitialized) (then call $_initialize))   

I count about 6-7 bytes to wrap each exported function. The dynamic cost would be 3 instructions: load, compare, branch.

If we wanted to reduce either of those costs, another option would be to add an explicit dependency to the export mechanism. Roughly, to call this export, this other one-shot function must be called. And the engine could do that lazily, or it could trap if that function hasn't called, etc.

kripken commented 1 year ago

I think this generally makes sense, and I don't have any better ideas, but I want to stress what I see as the downside here: If this is adopted by some VMs and not others, then wasm files will become less portable. Specifically, someone might notice that the same wasm file runs in one way in, say, wasmtime and wasm3 (which call _initialize automatically) but not on the Web when using new WebAssembly.Module() (as browsers do not call it automatically). And this can be surprising because the wasm itself does not indicate in any way "I can only be run in certain VMs" - all VMs can try to run it, with different results.

Now, maybe that's fine and expected. But I wonder if we can do a little better than leave it as a surprising thing for people to run into. Some vague thoughts:

  1. We could define layered concepts that make this feel natural. I mean, there could be a "vanilla wasm" that does not call _initialize, and a "fancy wasm" (all of this with better names :smile: ) that does. Then the explanation for why the wasm files run differently would be "Web VMs implement vanilla wasm, while server VMs also implement fancy wasm; these are different things, so you need different wasm files." And also we could say "to run a fancy wasm on the Web, you need custom JS, that is, fancy wasm + custom JS can run on a VM that supports JS + vanilla wasm."
  2. We could go further and actually make the wasm portable by making it possible to run it on the Web, that is, have a new API aside from new WebAssembly.Module that does call _initialize (and does not let exports/imports be entangled in the way that is the problem that causes the need for all of this).

I'm not sure either of those is a good idea! Just some thoughts.

Put another way, I think it would be good if we had a clear answer for someone that builds a new wasm VM tomorrow and asks themselves, "should I call _initialize or not?" If we don't have a rule for that then it might end up with "well, it seems like most of the wasm files people will run in my VM expect it / don't expect it, so I will / won't."

tlively commented 1 year ago

Maybe we could add a non-normative note to the Core and/or JS specs mentioning this toolchain convention and pointing out that just core spec initialization may not be enough to make the module usable as intended.