Open alexcrichton opened 1 month ago
Thank you for this writeup! ❤️
This all looks great and makes sense to me, with one exception: I think it might make sense to start with something a bit more conservative instead of the compile
functionality. I was thinking about something that's actually if anything closer to dlopen
: the ability to say "give me a module based on this (to me) opaque identifier."
That way, hosts that don't want to expose compilation abilities can do so, and we could additionally provide a separate interface for doing actual compilation for environments where that makes sense.
That is of course very similar to your preopens
interface, but I think it's the for now better primitive to provide.
I'm thinking about something along these lines:
package wasi:module-loader;
interface loader {
enum error { /* ... */ }
load: func(id: string) -> result<module, error>;
// Optionally, we could add a way to get a list of known modules:
available-modules: func() -> list<string>;
}
the ability to say "give me a module based on this (to me) opaque identifier."
That's what the preopen API that Alex sketched is for.
FWIW I would bikeshed the name and suggest "precompiles" or something along those lines.
hosts that don't want to expose compilation abilities can do so
They can return an error instead of compiling anything, but we could always layer the compile
interface on top to extend this into a new world too.
Also, listing all pre-compiled modules might not be something we want to expose, since the pre-compiled modules could come from the network in a FaaS platform, and then we would have TOCTOU bugs. Better to just try and get it if we have it, otherwise fall back to the back up plan (either find the module on disk and compile it, or propagate an error).
They can return an error instead of compiling anything, but we could always layer the
compile
interface on top to extend this into a new world too.
The key thing is that JIT-compilation is a different, much more powerful capability, which I think should be explicitly targeted via its own interface that can be statically checked for, instead of dynamically returning an error. Content that can't be run in an environment that doesn't support this should be rejected pre-deployment, ideally.
Also, listing all pre-compiled modules might not be something we want to expose, since the pre-compiled modules could come from the network in a FaaS platform, and then we would have TOCTOU bugs. Better to just try and get it if we have it, otherwise fall back to the back up plan (either find the module on disk and compile it, or propagate an error).
That's a very good point, yes.
I'd be happy with just having the "have id, want module" interface and nothing else :)
Ideally wasi-libc would have an optional import on "compile these bytes", but in lieu of that my thinking is that hosts would, by default, deny "compile these bytes" and you'd be able to opt-in on some hosts (e.g. the wasmtime
CLI). It's possible to make it so wasi-libc doesn't, by default, statically pull in the "compile these bytes" function but that would mean there would have to be a wasi-libc-specific API for "enable that" which would be another portability hazard
What I'm trying to convey is that I think these should be different interfaces, so they can be included in different worlds. Worlds that don't include the "compile some bytes for me" interface would be applicable much more broadly. So far at least we've treated packages, and certainly interfaces as an all-or-nothing thing instead of saying that it's okay to implement only parts of an interface and omit certain functions.
Regarding wasi-libc integration: would we even have the "compile some bytes for me" interface integrated into libc? What would that look like? It seems like loading a library based on an ID would map much more readily to dlopen?
That makes sense yeah, and I tried to sketch above separate interfaces as well. My point is that wasi-libc would want the "compile the bytes" interface by default because that's what native platforms expect (e.g. Python). That interface would be allowed to fail, though, and until we have optional imports I think that's the best we can do for wasi-libc.
For wasi-libc specifically a native-like experience would be a dlopen
function that interprets the input string as a file path. It would probe for the file, read the contents, and then pass the result to a "compile the bytes" function from the host. If that failed then the dlopen
call would fail, but that's how I'm imagining it'd be integrated into wasi-libc.
For wasi-libc specifically a native-like experience would be a
dlopen
function that interprets the input string as a file path. It would probe for the file, read the contents, and then pass the result to a "compile the bytes" function from the host. If that failed then thedlopen
call would fail, but that's how I'm imagining it'd be integrated into wasi-libc.
To add on to Alex's response here: wasi-libc
can't probe for (for example) Wasmtime's .cwasm
files (which contain the native code for a compiled .wasm
, for those that aren't familiar) because those are a Wasmtime-internal detail (and the precompiles
/preopens
interface is intended to satisfy that no-compilation use case, which could use things like .cwasm
s under the hood). At the portable Wasm/WASI/CM standards level, all we can work with for this fully dynamic case are Wasm modules.
For wasi-libc specifically a native-like experience would be a
dlopen
function that interprets the input string as a file path. It would probe for the file, read the contents, and then pass the result to a "compile the bytes" function from the host. If that failed then thedlopen
call would fail, but that's how I'm imagining it'd be integrated into wasi-libc.
I feel like I'm missing something, because this still doesn't make sense to me. Isn't the much more equivalent-to-native thing a dlopen
that, as you say, interprets the input string as a file path, but then takes that to be the thing to load directly, instead of the source of something to load?
I.e., wouldn't we want to leave it up to the host to decide how to go from the file path (aka, opaque ID) to a loadable module? In the wasmtime case, we'd look for a .cwasm
file, whereas in JCO for example, we'd look for a .wasm
file, but then compile it behind the scenes using WebAssembly.compileStreaming()
. And in any case, all libc would ever see is a module
reference to instantiate.
All of this is effectively the preopens thing, and I guess all I'm saying is we should start out having only that, but not call it preopens or anything :)
Till and I talked a bit more about this over video and the general conclusions we reached were:
dlopen
in wasi-libc would pass through the string to the "get the preexisting module" firstcompile: func(borrow<descriptor>) -> result<module>
. Which is to say it takes a reference to an open file (or something like that) rather than the bytes itself.Personally I think that'd be reasonable since there's not a huge use case right now for "generate wasm in content and then compile it", and that can always be satisfied with a filesystem too.
Perhaps it could be compile: func(input-stream) -> result<module>
, as you can get an input-stream
from a descriptor
using read-via-stream
, and that would free it from being tied to wasi-filesystem.
The thing that seems very important to me is that the primitive used by libc should not require being able to acquire the wasm bytes to get a module ref. compile: func(borrow<descriptor>) -> result<module>
seems borderline, but still just okay to me in that regard, because we can validly make read operations on the descriptor
fail. compile: func(input-stream) -> result<module>
seems like a step too far, otoh: once you have an input-stream
, you really should be able to actually read from it.
To expand on my reasoning here: For environments that want to/can only handle precompiled binaries, we really don't want to require the .wasm
file to be available, and we certainly don't want them to need to do something like "detect if this byte stream came from a .wasm
file and then swap in a precompiled version of that .wasm
file instead. If we operate on the descriptor
level and disallow reading from the descriptor
, then it seems reasonable to me to hand out the descriptor, but attach internal state that indicates that this really is a façade and can only be used as input for a module loading operation and nothing else.
I guess I would still somewhat prefer not to call this compile
, and instead something like load-module
, but that seems less important to me
Personally I think that'd be reasonable since there's not a huge use case right now for "generate wasm in content and then compile it", and that can always be satisfied with a filesystem too.
I liked the ability to JIT dotnet IL to wasm stream or bytes. Are you saying that I need to store those bytes to FS first ?
I'm not 100% sure but I think that chrome already stores precompiled wasm. Maybe they calculate hash ?
I agree that JIT compilation is important—it'll not be supported in all environments though, so I think it should not be part of the default way to support dlopen
.
What I'm imagining is that we'd have a separate WASI interface, potentially in a separate package, that'd allow you to get a descriptor
from a list of bytes (and/or a stream), which could then be used with the interface proposed here.
That way, environments that can't support actual JIT compilation can support dlopen
, but not expose this JIT-supporting interface.
Would this allow me to get native guest bindings to the dynamically loaded module? If so, how does this interface allow me to specify the expected shape of the interface in the loaded module?
At the BA summit this past weekend I discussed with a few folks about what it might look like to implement
dlopen
from C in the component model. What follows is a rough sketch about how this might be possible which is intended to capture the conversations that happened. At this time I don't believe anyone's lined up to work on this, but nevertheless I wanted to capture the context we discussed and what might be necessary. This is a rough shape of a solution and will need more work to get standardized and implemented.The general idea is that we'd like to explore adding component model intrinsics which support the ability to load an arbitrary wasm module at runtime, open it, and start executing it. This is what
dlopen
does on native platforms and is useful for a variety of use cases. Perhaps chiefly though is that existing language ecosystems expect this to work, so getting them to work requires an implementation ofdlopen
.The other general idea is that we'd like to standardize as-general-as-possible intrinsics and building blocks as necessary. Emscripten for example has a model of dynamic linking today but we don't want to bake that exactly as-is into the component model. Instead it should be possible to build various other forms of dynamic linking, if necessary, on top of component model intrinsics. The north star for now is the Emscripten-style dynamic linking since that's what tooling supports, but it's hoped that implementation support can still be generalized.
Component Model Changes
Supporting a full-fledged
dlopen
will require changes to the component model today.Component Model: New Types
A new built-in resource type will be added to the component model, a "moduleref". For example in the component model you'll be able to do:
A
module
here is a resource definition of a new type that the host understands. This is similar to declaring and importing a resource except that it's provided by the host and is the same across all components. This resource type can haveown
andborrow
handles like other resources in the component model.This new type would additionally be added to WIT, too.
Component Model: New WASI APIs
With this new type available in the component model the thinking is that new WASI APIs would be added for acquiring modules. This enables hosts to implement a variety of methods of identifying and loading modules. Furthermore by being WASI APIs it enables virtualizing these implementations as necessary too. Currently the rough idea is:
Here a host can provide the ability to compile arbitrary wasm bytes. These bytes might be loaded through the filesystem, for example, or through other means. Hosts should be able to return "not supported" for
compile
or this would also be a great use case for optional imports.Hosts also can provide a set of propened modules (perhaps with a better name). This represents ahead-of-time compiled modules for examples and might be more suitable in contexts where fully dynamic runtime compilation is not allowed.
When implementing
dlopen
it's expected thatwasi-libc
would locate the module-to-instantiate by doing something like:preopens/get
method. Use that if present.compile
. If that fails, then return an error.At this point
dlopen
has a handle to a module to instantiate, so the next bit is instantiating it.Component Model: New Intrinsincs
Instantiation is sketched here as entirely outside the realm of WIT. Everything that follows is purely a component model intrinsic (similar to
resource.drop
) and can be synthesized in any component.First up are intrinsics to perform runtime inspection of a
module
. Everything here is listed as-if it had mostly-WIT types but each intrinsic here is actually producing a core module.module.imports_len : func(m: borrow<module>) -> u32
- returns the number of imports a module hasmodule.import_{module,name}_len : func(m: borrow<module>, import: u32) -> u32
- returns the byte length of the import name (utf-8 encoded)module.import_{module,name} $memory : func(m: borrow<module>, import: u32, ptr: i32)
- fills inptr
in linear memory with the contents of the nth import name.Note that at this time type-reflection of modules isn't supported. It's expected that can be added later if needed, but it's hopefully not needed yet. (TODO: maybe these should just be component-model WIT types?)
Next there will additionally be an API to read custom sections of modules, for example
dylink.0
in the Emscripten-based ABI:module.custom_section_size : func(m: borrow<module>, name: string) -> option<u32>
- returns the byte length of the custom sectionname
, ornone
if it's not present.module.custom_section_read $memory : func(m: borrow<module>, dst: i32, len: i32, src: i32)
- reads a custom section into linear memory with a memcpy-style API.(TODO: like above, maybe this is better modeled with component model types? Also needs to handle the possibility of repeated custom sections too)
Next there needs to be the ability to build up the set of imports that will be used to instantiate a module. This is done with an "imports builder" type which acts like a resource but doesn't actually have any definition in WIT or the component model itself (at least not at this time)
imports_builder.new : func() -> IB
- create a new blank imports builderimports_builder.drop : func(IB)
- destroys a builder (TODO: mayberesource.drop
?)imports_builder.bind_{memory,global,table,func} $index : func(borrow<IB>, string, string)
- binds the statically provided item to the names provided. This is used, for example, to provide a module's own memory to the import listimports_builder.new_global_i32 : func(borrow<IB>, string, string, i32)
- creates a brand new wasm global (mutable? new parameter?) with the provided initial value. (this is assumed it's needed for the Emscripten ABI)imports_builder.bind_funcref : func(borrow<IB>, string, string, funcref)
- binds the provided function to the specified import name. This is used to provide a module's own functions to imports.It's hoped that with all of the above it's possible to implement basically everything in
dlopen
from the Emscripten dynamic linking ABI. With all of this it culminates in a single intrinsic:imports_builder.instantiate : func(borrow<module>, borrow<IB>) -> result<instance, string>
where this final
instantiate
intrinsic is used to perform instantiation itself (TODO: return type here needs some work).There will also need to be an API or two to lookup globals/functions on the returned
instance
.Integration with
wasi-libc
It's hoped that all of the above will be implementations of
dlopen
inwasi-libc
. It's not expected that applications will necessarily be manipulating the intrinsics themselves and such. All the details of how the Emscripten dynamic linking ABI, for example, would be encoded inwasi-libc
in terms of matching names, providing imports, manipulating memories and globals, etc.This is very much a work-in-progress design. Even just writing this up I feel like we may want to shift more things into WIT or similar or have WIT-defined builtins rather than so many intrinsics. Furthermore there's a lot of details here to prove out and also ensure that there's enough functionality to fully implement Emscripten's dynamic linking ABI.
cc @dicej, @fitzgen, @sunfishcode