Open pchickey opened 1 year ago
Just wanted to chime in and say this all sounds good to me. Thanks for writing it up!
I'm happy to implement the short-circuit workaround for existing modules if nobody else has started yet.
I want to add that invoking c++ constructors through a new exported function would be a problem for .Net CoreCLR as any exported function would go through it's reverse P/Invoke mechanism which expects the runtime to already be initialised. If the reactor's _initialize
could be called then that would be enough for this use case. Thanks
Components and Constructors
Pat Hickey, 28 Feb 2023
Presently, there are some major problems using C/C++ constructors (ctors) with the component model.
cabi_realloc
function.cabi_realloc
is defined with the same syntax as every other user-defined export function, but it has a special restriction under the component model: no component import functions may be called from inside it.wasm-ld
currently takes each user-defined export function in the module and synthesizes an export function which runs the ctors, then calls the user-defined function.wasi-libc
currently uses ctors to eagerly initialize the environment, preopens, and some bookkeeping for monotonic clocks. These ctors in turn call various WASI import functions.Together, this state of affairs is incompatible with the component model, because
cabi_realloc
will call component import functions. See preview2-prototyping #97.Shortest-term fix: preview 1 adapter short-circuits import function calls in ctors
The
wasm-ld
-synthesized export functions always call ctors, as part of invoking every export function call in the instance. This means ctors get run many times over the lifetime of an instance, which is not desirable behavior in general.Joel recognized that this behavior does mean we can get away with implementing "short-circuit" logic that sets a global when the adapter's
cabi_realloc
has called into the adaptee'scabi_realloc
. Then, in any preview 1 import functions that may be called from the adaptee'swasi-libc
ctors, a trivial value is returned (i.e. the empty environment, the empty set of preopens) rather than call out to a preview 2 component import function.The adapter is built for the
wasm32-unknown-unknown
target, so it does not include wasi-libc, and therefore the adapter's exports do not call any ctors of its own.We get away with giving wasi-libc an incorrect value in the ctor because the wasi-libc ctors will be run again, correctly, at the start of every other export function besides
cabi_realloc
.Short-term fix: guest bindgen behavior eliminates ctor calls in
cabi_realloc
wasm-ld
has logic to not synthesize ctor-calling wrapper functions iff the user's program makes calls to__wasm_call_ctors
.The Rust guest bindgen macro generates the definitions for all export functions, and the Rust guest crate provides the definition of
cabi_realloc
.This does not work for the command case, because the
_start
export function is defined bywasi-libc
, and needs to run ctors before calling the user'sfn main()
. This case will depend on the short-circuiting behavior in the adapter described above.Medium-term fix:
wasi-libc
, when used bystd
, no longer uses ctorsThere is only one case where
wasi-libc
really needs to use ctors: implementing theextern char **environ
symbol. This archaic feature of libc assumes that the program's memory has the environment written to it before execution begins (e.g.execve(2)
). Since wasm does not have this facility, wasi-libc uses ctors to call import functions and initializeenviron
before any other code is executed. It does, however, contain logic to fetch the environment lazily ifenviron
is not present in the linked executable.Rust's
std
usedenviron
until recently, when Dan switched it to use a wasi-libc specific facility that allows iterating over the entire environment while still allowing it to be lazily initialized. So, as long as Rust guests are compiled with rustc 1.69 or later (nightly as of this writing, and should be stable on April 20), they won't call any environment-related import functions in ctors.wasi-libc
uses ctors in two more places: detecting preopens, and initializing a start time for the monotonic clock. The monotonic clock ctor should be trivial to remove: absolute values of the monotonic clock are undefined, so there is no need for the logic to count up from ctor-time.Preopen ctors are trickier to remove, but we believe that we should be able to lazily initialize the preopen state at the beginning of calls to
open(2)
andclose(2)
.Once those two changes land in
wasi-libc
and a release is cut that can be upstreamed intostd
, it should be possible to compile Rust programs which have no dependency on ctors at all, as long as they don't contain their own references toenviron
, or link with C/C++ which uses ctors or environ.Long-term fix: C/C++ programs depending on ctors are supported by tool-conventions and component model canonical ABI support
Having worked around the unpleasantness of ctors as much as possible, we need some long term solution to give C/C++ programs using ctors a way to run in the component model.
Even outside of the component model, the behavior of ctors in all Wasm targets is totally undefined: they behave however llvm happens to have implemented them at the moment. This situation isn't ideal for any Wasm user.
Dan and Luke have a rough plan to come up with a Wasm tool-conventions spec, which will describe a convention for constructors across all Wasm targets. The convention will, roughly, define:
a command as a module having an export named
_start
. In commands, ctors should be run only at the beginning of_start
. Instances should expect to only have_start
executed once, and trap on any subsequent invocations.a reactor as a module having an export named
_init
. In reactors, ctors should be run only in_init
. Instances should expect to have_init
executed exactly once, before any other export functions are invoked.Users have a reasonable expectation to be able to generate components from both
wasm32-wasi
andwasm32-unknown-unknown
targets, since WASI isn't required to use the component model, so using a convention across all Wasm targets takes care of those users, rather than tailoring a solution to just WASI users.Dan is going to take this proposal to the Wasm LLVM team (sbc100 et al) and hash out all of the details with them, making whatever modifications to this design based on their feedback.
Once the tool-convention spec is accepted by the LLVM team and the Wasm CG, it can then be relied upon by the component model spec, and the component model's canonical ABI can guarantee that reactors get
_init
executed following instantiation. The usual design problems around instantiation/initialization order of components which import each other apply here, and import functions may or may not end up being legal to call during_init
.This is a very long-running spec and implementation period, but with the fixes above, Bytecode Alliance stakeholders have managed to put a spec solution to ctors off the critical path, so it will be OK if this takes all year to get done.
Fixes for compatibility with existing Preview 1 modules
Either of the following solutions will keep binary compatibility working with existing Preview 1 modules:
cabi_realloc
export function has been synthesized bywasm-ld
to call the ctors, then call the user-defined (internal)cabi_realloc
. Replace the body of this function with a call to the internalcabi_realloc
.