bytecodealliance / preview2-prototyping

Polyfill adapter for preview1-using wasm modules to call preview2 functions.
Other
77 stars 20 forks source link

Tooling issue: Components and Constructors #99

Open pchickey opened 1 year ago

pchickey commented 1 year ago

Components and Constructors

Pat Hickey, 28 Feb 2023

Presently, there are some major problems using C/C++ constructors (ctors) with the component model.

Together, this state of affairs is incompatible with the component model, because cabi_realloc will call component import functions. See preview2-prototyping #97.

Shortest-term fix: preview 1 adapter short-circuits import function calls in ctors

The wasm-ld-synthesized export functions always call ctors, as part of invoking every export function call in the instance. This means ctors get run many times over the lifetime of an instance, which is not desirable behavior in general.

Joel recognized that this behavior does mean we can get away with implementing "short-circuit" logic that sets a global when the adapter's cabi_realloc has called into the adaptee's cabi_realloc. Then, in any preview 1 import functions that may be called from the adaptee's wasi-libc ctors, a trivial value is returned (i.e. the empty environment, the empty set of preopens) rather than call out to a preview 2 component import function.

The adapter is built for the wasm32-unknown-unknown target, so it does not include wasi-libc, and therefore the adapter's exports do not call any ctors of its own.

We get away with giving wasi-libc an incorrect value in the ctor because the wasi-libc ctors will be run again, correctly, at the start of every other export function besides cabi_realloc.

Short-term fix: guest bindgen behavior eliminates ctor calls in cabi_realloc

wasm-ld has logic to not synthesize ctor-calling wrapper functions iff the user's program makes calls to __wasm_call_ctors.

The Rust guest bindgen macro generates the definitions for all export functions, and the Rust guest crate provides the definition of cabi_realloc.

This does not work for the command case, because the _start export function is defined by wasi-libc, and needs to run ctors before calling the user's fn main(). This case will depend on the short-circuiting behavior in the adapter described above.

Medium-term fix: wasi-libc, when used by std, no longer uses ctors

There is only one case where wasi-libc really needs to use ctors: implementing the extern char **environ symbol. This archaic feature of libc assumes that the program's memory has the environment written to it before execution begins (e.g. execve(2)). Since wasm does not have this facility, wasi-libc uses ctors to call import functions and initialize environ before any other code is executed. It does, however, contain logic to fetch the environment lazily if environ is not present in the linked executable.

Rust's std used environ until recently, when Dan switched it to use a wasi-libc specific facility that allows iterating over the entire environment while still allowing it to be lazily initialized. So, as long as Rust guests are compiled with rustc 1.69 or later (nightly as of this writing, and should be stable on April 20), they won't call any environment-related import functions in ctors.

wasi-libc uses ctors in two more places: detecting preopens, and initializing a start time for the monotonic clock. The monotonic clock ctor should be trivial to remove: absolute values of the monotonic clock are undefined, so there is no need for the logic to count up from ctor-time.

Preopen ctors are trickier to remove, but we believe that we should be able to lazily initialize the preopen state at the beginning of calls to open(2) and close(2).

Once those two changes land in wasi-libc and a release is cut that can be upstreamed into std, it should be possible to compile Rust programs which have no dependency on ctors at all, as long as they don't contain their own references to environ, or link with C/C++ which uses ctors or environ.

Long-term fix: C/C++ programs depending on ctors are supported by tool-conventions and component model canonical ABI support

Having worked around the unpleasantness of ctors as much as possible, we need some long term solution to give C/C++ programs using ctors a way to run in the component model.

Even outside of the component model, the behavior of ctors in all Wasm targets is totally undefined: they behave however llvm happens to have implemented them at the moment. This situation isn't ideal for any Wasm user.

Dan and Luke have a rough plan to come up with a Wasm tool-conventions spec, which will describe a convention for constructors across all Wasm targets. The convention will, roughly, define:

Users have a reasonable expectation to be able to generate components from both wasm32-wasi and wasm32-unknown-unknown targets, since WASI isn't required to use the component model, so using a convention across all Wasm targets takes care of those users, rather than tailoring a solution to just WASI users.

Dan is going to take this proposal to the Wasm LLVM team (sbc100 et al) and hash out all of the details with them, making whatever modifications to this design based on their feedback.

Once the tool-convention spec is accepted by the LLVM team and the Wasm CG, it can then be relied upon by the component model spec, and the component model's canonical ABI can guarantee that reactors get _init executed following instantiation. The usual design problems around instantiation/initialization order of components which import each other apply here, and import functions may or may not end up being legal to call during _init.

This is a very long-running spec and implementation period, but with the fixes above, Bytecode Alliance stakeholders have managed to put a spec solution to ctors off the critical path, so it will be OK if this takes all year to get done.

Fixes for compatibility with existing Preview 1 modules

Either of the following solutions will keep binary compatibility working with existing Preview 1 modules:

dicej commented 1 year ago

Just wanted to chime in and say this all sounds good to me. Thanks for writing it up!

I'm happy to implement the short-circuit workaround for existing modules if nobody else has started yet.

yowl commented 1 year ago

I want to add that invoking c++ constructors through a new exported function would be a problem for .Net CoreCLR as any exported function would go through it's reverse P/Invoke mechanism which expects the runtime to already be initialised. If the reactor's _initialize could be called then that would be enough for this use case. Thanks