bytecodealliance / wasm-micro-runtime

WebAssembly Micro Runtime (WAMR)
Apache License 2.0
4.84k stars 618 forks source link

[wasi-nn] Revised WASI-nn Architecture Proposal #3677

Closed lum1n0us closed 1 month ago

lum1n0us commented 2 months ago

Introduction

We're considering a significant change to the WASI-nn architecture, prompted by a new pull request titled Replace load with load-by-name. This change is important because sticking with the current architecture could lead to a less-than-ideal user experience and frequent performance issues.

Problems

The main change proposed by the pull request is to switch from using the load() function to a load-by-name() function when loading model files within the WASI-nn framework. This change means that the WebAssembly (Wasm) side won't be able to determine which backend should run the model file. Instead, the runtime (host) side will have the full ability to select the most suitable backend for inference.

From the Wasm perspective, there's no awareness of the backends, making it challenging for users to specify a backend when using iwasm, such as with the command iwasm --native-lib=libwasi-nn-backends-xxx.so. Moreover, depending on the hardware conditions and the implementation status of the backends, different backends might be chosen to run the same model file.

On the runtime (host) side, without sufficient knowledge about the required backends, it's difficult to expect users of WAMR to pick a backend that is compatible with most models. Choosing a potentially inappropriate backend through --native-lib could lead to poor performance or even failure.

current

To keep the WAMR core library size small, all wasi-nn backends are currently separate shared libraries (.so files). Each contains the implementation of wasi-nn APIs and the backend libraries. Users specify the particular backend with the iwasm --native-lib=... command. The --native-lib option uses wasm_native_register_natives() to satisfy the Wasm imports with a set of host functions from the backends. If multiple --native-lib options are used and they all target the same imports, the earlier sets of host functions can be overwritten, meaning only one backend can be registered at a time.

Proposed Solutions

Option A. General Backend Registration

next

We propose a wasi-nn backends general system that registers for wasi-nn Wasm import requirements in wasm_native_init() and de-registers in wasm_native_destroy().

This general backend system would eagerly load, via dlopen(), backends from specified paths. Unlike the --native-lib option, this approach has the advantage of being able to load multiple backends.

During the execution of a .wasm file, the general backend system would act as a trampoline, selecting the appropriate backend during the load() (load_by_name() before the PR) process for each Wasm instance. This selection would be based on the model file formats and the hardware conditions. Subsequent calls would be dispatched to the chosen backend. A wasi-nn backend would be released, via dlclose() when its associated Wasm instance is de-initialized.

Each wasi-nn backend shared library would contain a set of predefined methods, such as load(), init_execution_context(), set_input(), compute(), and get_output(), in accordance with the wasi-nn specification. After being loaded by dlopen(), these functions would be registered in the wasi-nn backend general. A key-value table would store the backends and their predefined APIs.


We invite the community to provide feedback and comments on this proposed change to the WASI-nn architecture. Your insights will help us refine the design and ensure it meets the needs of our users.