dylibso / chicory

Native JVM WebAssembly runtime
Apache License 2.0
431 stars 34 forks source link

Introduce Host Modules #482

Open evacchi opened 3 weeks ago

evacchi commented 3 weeks ago

Every Wasm module is able to require some items, such as functions, at instantiation phase. These are called imports, and every import is labeled by "a two-level name space, consisting of a module name and a name for the entity".

The spec also shallowly defines the concept of Host Function as of

a function expressed outside WebAssembly but passed to a module as an import.

Because imports are alway qualified and because, generally, related functions belong to the same module, they are usually qualified by the same name, and they might even share some form of state. wazero has introduced the concept of a Host Module.

For instance, suppose that a module need_add.wasm imports a function env.add:

(module (import "env" "add" (func)))

Instead of declaring the function <env, add>, users define a host module env and then declare all of the functions that belong to that module, including add.

wazero's Host Module is nothing more than a bundle of related functions that, for convenience, are defined together. On a surface level, this is a convenience for end users, who will declare modules in a more concise way; but in reality, bundling related functions into a host module simplifies management within the engine, and ultimately improves lifecycle control.

This is a proposal to introduce a similar construct in Chicory.

Benefits for End Users

End users will be able to define a collection of related host functions more concisely.

For instance, the signature for the HostFunction constructor is currently:

    public HostFunction(
            WasmFunctionHandle handle,
            String moduleName,
            String fieldName,
            List<ValueType> paramTypes,
            List<ValueType> returnTypes) {
        this.handle = handle;
        this.moduleName = moduleName;
        this.fieldName = fieldName;
        this.paramTypes = paramTypes;
        this.returnTypes = returnTypes;
    }

We can imagine to provide a HostModule.Builder in the same fashion as wazero's:

    _, err := r.NewHostModuleBuilder("env").
        NewFunctionBuilder().
        WithFunc(func(v uint32) {
            fmt.Println("log_i32 >>", v)
        }).
        Export("log_i32").
        NewFunctionBuilder().
        WithFunc(func() uint32 {
            if envYear, err := strconv.ParseUint(os.Getenv("CURRENT_YEAR"), 10, 64); err == nil {
                return uint32(envYear) // Allow env-override to prevent annual test maintenance!
            }
            return uint32(time.Now().Year())
        }).
        Export("current_year").
        Instantiate(ctx)

Eventually, we could even provide a way for end-users to define a module as a simple class, and derive a host module (and hence, host functions) automatically. For instance (strawman syntax):


@HostModule("env")
public class EnvModule {
    @WasmExport("log_i32")
    public int logI32() {
        ...
    }

    @WasmExport("current_year")
    public int currentYear() {
        ...
    }
}

which would internally something like:

    HostModule.builder(EnvModule.class).
        withExport("log_i32", EnvModule::logI32, (...params...), (...returns...))
        withExport("current_year", EnvModule::currentYear, (...params...), (...returns...))

Benefits for the Engine

The higher-level concept of a host module instead of the lower-level concept of freestanding host functions allows to treat the lifecycle of such host functions similarly to the lifecycle of a module.

For instance, host functions are able to close over their environment, but there is no unified way to initialize their state, or an explicit way to release resources they might be indirectly refer to.

For instance, imagine a host function closing over a file handle. As long as the host function is kept around, directly or indirectly (for instance, because of some transitive import directive), that resource will be held onto.

By introducing host modules, we can uniformly control the lifecycle of both Wasm modules and host modules.

This will become increasingly important, as it is common to be able to automatically cross-link modules, using the import/export system. Unifying Wasm modules to host modules will simplify introducing a form of automated linking (to be discussed in a separate issue) because it will also make explicit initialization and destruction of both Wasm modules and host functions.

For instance, consider this example from the [wazero documentation][wazero-host-accesss]:

The module need_add.wasm we introduced at the beginning imports a function env.add and exports a function use_add function which calls env.add:

(module ;; need_add
    (import "env" "add" (func))
    (export "use_add" (func))
)

Users define a host module and load it together with need_add.wasm. When both modules are instantiated, wazero ensures that a module named env is available and it provides an add function. It is irrelevant whether the function is a host function or it is defined by another Wasm module, as long as the name and signature can be resolved successfully:

                                                        func add(foo, bar int32) int32 {
                                                            return foo + bar
                                                        }         |
                                                                  |
                                                                  | implements
                                host module                       v
+---------+                +------------------+          +-----------------+
| Runtime | -------------> | (module: myhost) | -------> | (function: add) |
+---------+  ^             +------------------+  export  +-----------------+
    \       /                                                       /
     \instantiate                                                  /
      \   /                                                       /
       \ v                                                       /
        \                                                       /
         \                                                     / imported
          \ (import "myhost" "add" (func))                    /
           \                                                 /
            \                                   +-----------/------+
             \                                  |          v       |
              \                                 |   (myhost.add)   |
               v                                |        ^         |
                +--------------------+          |        | call    |
                | (module: need_add) |--------->| (export:use_add) <----- Exported
                +--------------------+          |                  |
                                                +------------------+
                                            functions in need_add's sandbox
bhelx commented 3 weeks ago

This looks good to me and I support this. A few comments:

We can imagine to provide a HostModule.Builder in the same fashion as wazero's:

I think the way wazero does it is perhaps too granular for my taste. Something about a deeply nested builder feels to me like it would be hard to write without copy pasting an example. I'd prefer the "internal" option you suggested. Though if you think Java users are used to building deeply nested objects like that, i think that's okay. I think the class option is ideal too esp as it gives you the ability to attach some state to the instance.

evacchi commented 3 weeks ago

oh yes, the Go version is just for reference.

As I was porting over WasiPreview1 as an example, I am realizing that maybe class-based option will be needed in some fashion, especially because we probably want to be able to instantiate with some state, and then close() to cleanup that state.

The builder will be mostly used for one-offs 🤔 EDIT: or not? we probably want to keep the pair Module/Instance... 🤔