feature: add "named models"

abrown commented 1 year ago

Currently the wasi-nn API only allows loading ML models from their byte-serialized format (i.e., using load). This can be problematic for several reasons:

models may be large — "gigabytes" large
not all backends provide a way to load models from bytes (e.g., TF expects a certain filesystem layout)
retrieving and loading a model can be the most expensive part of an inference request — hosts may want to load a model once and reuse it across multiple Wasm module instantiations

I would like to propose "named models" — a way of solving these issues. Other WASI proposals, such as wasi-filesystem and wasi-sockets, provide a way of creating pre-instantiation resources that are then available to the Wasm module once instantiated (see, e.g., the --dir and --listenfd flags on the Wasmtime CLI). If a similar idea were available to wasi-nn, users could specify models before instantiation and these could be shared across instances. This sharing could only happen, however, if the models are "named."

Spec changes

To support this in the specification, one would need the ability to load a model using only a name and (possibly) the ability to a load a model from bytes and name it. This way there could be some symmetry between the host and guest functionality. I think this could be supported by adding the following functions:

// Like the `load` function, but the host would retain a mapping from `name` to the `graph`.
load_named: func(builder: graph-builder-array, encoding: graph-encoding, target: execution-target, name: string) -> expected<graph, error>

// Retrieve the loaded `graph` for the given `name`; this could be pre-loaded prior to instantiation or loaded by `load_named`.
get_named: func(name: string) -> expected<graph, error>

Obviously the ability to load a "named model" for all instances running in a host is up for debate: perhaps the available scope of that name should only be the Wasm instance itself or some host-specified neighborhood. I included the most controversial version, global scope, to see what people think. I also think the host may want to implement some way to limit the resources consumed by wasi-nn; this is a host implementation concern, discussed below.

Host engine changes

Though this repository is the spec repository and is primarily concerned with the Wasm-visible API, I think it would be valuable to discuss what changes this might imply for an engine implementing wasi-nn. Here are some suggestions:

The engine might want to limit the resources available to a wasi-nn-using module: this could take the form of limiting the number of models loaded via load or load_named, limiting the size of the models loaded (somehow), etc. One could imagine a flag like --nn-max-models to do something like this. (I would also think it would be great to have a generic way to limit any WASI API if anyone has thoughts on that).
The engine would likely want a way to preload some models to avoid load-ing them repeatedly in new Wasm instances. One could imagine a flag like --nn-preload <name>:<encoding>:<path> to tell the engine both the name of the model and how to load it. All modules instantiated by that engine would have the models available for retrieval with get_named.

abrown commented 1 year ago

cc: @geekbeast, @roee88

sunfishcode commented 1 year ago

A global scope would not be virtualizable.

The way --dir and --listenfd work is that they pass in (what are conceptually) handles into the running program, and associating names with them only as a compatibility layer with existing code and command-line argument passing schemes.

One option would be to have the registry be dynamic, and referenced by handle. Another option would be to make the registry be a "link-time authority", meaning you can have a function like get_named, but the scope of the registry isn't global, but the instance that get_named is conceptually imported from. That way, two instances of wasi-nn could be instantiated separately without them implicitly sharing a scope.

abrown commented 1 year ago

I've come to a similar conclusion as @geekbeast and I have discussed this. The idea that load_named must create a named model that could be retrieved from any other instance with get_named is not a good one. For one, it would be problematic on multi-tenant systems, allowing tenant A to access tenant B's proprietary model, e.g. Instead, the wasi-nn specification should not define the scope of the names used, leaving it up to the host to define a policy for what names are available to what instances. This would correspond to how --dir paths work, e.g.: a host could map a path used by all instances to the same physical location so that the instances access the same file but the WASI file system spec certainly does not require this or even encourage this. The same should be true here — the host decides how to map names to "handles."

And, given that names would not be globally scoped, do we even need load_named? I brought this up in https://github.com/WebAssembly/wasi-nn/pull/38 as well: what is the point of load_named if there is no guarantee that other instances can use that model? If a module does the heavy lifting to load a model, why name it just for itself — it might as well just hang on to the model resource instead? I think https://github.com/WebAssembly/wasi-nn/pull/38 is written under the assumption of tenant-scoped names (a model loaded by a tenant is available to all the tenant's instances) but I think this still breaks the virtualizability goal @sunfishcode is talking about.

@sunfishcode, I'm not sure I understand exactly what you mean by the two options you bring up at the end. My feeling now after some thought is that the best way to provide the name-model mapping is "out-of-band:" i.e., forget about load_named and have host APIs that allow tenants to provide the pre-loaded, named models. I'm open to discussing it more, though!

geekbeast commented 1 year ago

@sunfishcode My understanding is that the registry being proposed here is definitely not be global in scope. While there are some common inference models that hosts may choose to expose, they can do so by modifying their initialization of the dynamic registry at the host level.

Do you have any objections to a fully dynamic registry who contents are completely controlled by host policy? That's actually what I implemented in bytecodealliance/wastime#6134 (caveat there is a bug with the lifetime that I'm fixing up at the moment).

geekbeast commented 1 year ago

@sunfishcode Also quick clarifying question that I suspect I know the answer to here. If two identical WASM guest programs are run, any visible shared state across those two programs would be considered to break virtualization, correct?

geekbeast commented 1 year ago

The driving reason behind named models is that model compilation is expensive. I have seen 24x times inference cost for model compilation in some testing. A simpler solution here maybe to expose an import_compiled_model(...) that takes a model that a model that is ready to be executed. This maybe not be compatible with all frameworks. In particular, this is a current open issue in TensorFlow (https://github.com/tensorflow/tensorflow/issues/55520)

mingqiusun commented 1 year ago

I would think even a local scoped load_named method is useful, as it solves the problem of maintaining state information across calls in the FaaS use case.

juntao commented 1 year ago

This sounds great! We look forward to implementing this in WasmEdge!

hydai commented 1 year ago

Hello, this is hydai from WasmEdge.

I'd like to clarify something about the section you mentioned:

The engine might require a mechanism to preload certain models to circumvent repeatedly loading them in new Wasm instances. One could suggest a flag such as --nn-preload :: to instruct the engine about the model name and the loading method. All modules initiated by this engine would have access to these preloaded models via get_named.

It seems like this would load directly from a host path, rather than utilizing a guest path mapped via pre-open. For instance, let's consider a model named NM located at the host path /host_path/NM.model. We could preload this model using wasm_runtime --nn-preload NM:<E>:/host_path/NM.model.

I'm curious if it's possible to use a guest path instead. Here's an example:

# Host
/host_path/NM.model
# Map it with pre-open
wasm_runtime --dir /guest_path/NM.model:/host_path/NM.model wasi-nn.wasm
# Within the guest environment
call register_named: func(name: string, path: string) -> expected<error>
prior to invoking load_named();

The reason behind this question is a need for container integration:

When docker+wasm is used, all files within the container images are mapped into the pre-open system.
Similarly, curn+wasm_handler maps --dir .:. to allow the wasm runtime to handle files within the container images.

abrown commented 1 year ago

Hey @hydai, thanks for the feedback. The idea I was going for with this change is that the model bytes should not even need to be visible to the Wasm guest at all. The issue with the current load function is not just that the model bytes need to be retrieved but that, for some backends, the host will have to compile the model bytes to some other form. I am trying to make it possible to avoid that overhead by naming the models without dictating how the host assigns those names to models. So I guess I see two ways to answer the question:

I guess for your host you could make up a way to configure names from guest paths and have the host keep track of those; CLI flags would be one way to do that and the wasi-nn spec doesn't say what those need to look like.
The other way is to register names within the guest: in @geekbeast's PR for this proposal I think he calls this "register_model" but I used "load_named" up above. The user then has to pay the overhead of compiling the model, though.

hydai commented 1 year ago

Hi @abrown

Gocha!

I would like to delve deeper into 'compiling model bytes into other forms.'

Two aspects require clarification:

Regarding the responsibility of this task, it would be beneficial to understand if the users are expected to undertake the precompilation and subsequently supply the location of the compiled form to WASI-NN. Alternatively, is it envisioned that WASI-NN, upon identifying the requirement by backends, would initiate the compilation of model bytes?
The described behavior indeed resembles 'pre-processing' work, such as 'image2tensor'. Given this similarity, could it be inferred that this function would be incorporated into the WASI-NN toolkit?

Another thing is that do we need to add deregister_model or unload_model functions? So users can add/remove named models according to the different use scenario.

abrown commented 1 year ago

Regarding the responsibility of this task, it would be beneficial to understand if the users are expected to undertake the precompilation and subsequently supply the location of the compiled form to WASI-NN. Alternatively, is it envisioned that WASI-NN, upon identifying the requirement by backends, would initiate the compilation of model bytes?

I don't think we want this "precompilation" step to be guest-visible. If it were, it would expose differences between backends that compile and those that don't. And if we hide the compilation like we do, each host implementation is free to optimize when and how they do this step.

The described behavior indeed resembles 'pre-processing' work, such as 'image2tensor'. Given this similarity, could it be inferred that this function would be incorporated into the WASI-NN toolkit?

I don't think it would fit in the wasi-nn bindings because this compilation I'm talking about is a host-side thing.

Another thing is that do we need to add deregister_model or unload_model functions? So users can add/remove named models according to the different use scenario.

@geekbeast and I discussed this over (I think via a meeting) as we were thinking through https://github.com/WebAssembly/wasi-nn/pull/38. (You might want to look at that PR as well). I am pretty reticent to add more surface area to the wasi-nn API unless we absolutely need it. More API surface means more to implement, more to maintain, more chances to make a mistake. So we agreed to hold off on deregistering models until a user demands it; I think we're expecting most users to clean up the wasi-nn state by just existing the Wasm instance. If this does become an issue for someone, though, it seems like a reasonable addition.

abrown commented 1 year ago

Ok, this feature has now been added by #38.

WebAssembly / wasi-nn

feature: add "named models" #36

Spec changes

Host engine changes