deislabs / wasi-nn-onnx

Experimental ONNX implementation for WASI NN.
MIT License
47 stars 4 forks source link

ONNX implementation for WASI NN

This project is an experimental ONNX implementation for the WASI NN specification, and it enables performing neural network inferences in WASI runtimes at near-native performance for ONNX models by leveraging CPU multi-threading or GPU usage on the runtime, and exporting this host functionality to guest modules running in WebAssembly.

It follows the [WASI NN implementation from Wasmtime][wasmtime-impl], and adds two new runtimes for performing inferences on ONNX models:

How does this work?

WASI NN is a "graph loader" API. This means the guest WebAssembly module passes the ONNX model as opaque bytes to the runtime, together with input tensors, the runtime performs the inference, and the guest module can then retrieve the output tensors. The WASI NN API is as follows:

The two back-ends from this repository implement the API defined above using each of the two runtimes mentioned. So why two implementations? The main reason has to do with the performance vs. ease of configuration trade-off. More specifically:

The following represents a very simple benchmark of running two computer vision models, [SqueezeNetV1][sq] and MobileNetV2, compiled natively, run with WASI NN with both back-ends, and run purely on WebAssembly using Tract. All inferences are performed on the CPU-only for now:

SqueezeNet  performance

MobileNetV2 performance

A few notes on the performance:

Current limitations

Building, running, and writing WebAssembly modules that use WASI NN

The following are the build instructions for Linux. First, download the ONNX runtime 1.6 shared library and unarchive it. Then, build the helper binary:

➜ cargo build --release --bin wasmtime-onnx --features  tract,c_onnxruntime

At this point, follow the Rust example and test to build a WebAssembly module that uses this API, which uses the Rust client bindings for the API.

Then, to run the example and test from this repository, using the native ONNX runtime:

➜ LD_LIBRARY_PATH=<PATH-TO-ONNX>/onnx/onnxruntime-linux-x64-1.6.0/lib RUST_LOG=wasi_nn_onnx_wasmtime=info,wasmtime_onnx=info \
        ./target/release/wasmtime-onnx \
        tests/rust/target/wasm32-wasi/release/wasi-nn-rust.wasm \
        --cache cache.toml \
        --dir tests/testdata \
        --invoke batch_squeezenet \
        --c-runtime

Or to run the same function using the Tract runtime:

➜ LD_LIBRARY_PATH=<PATH-TO-ONNX>/onnx/onnxruntime-linux-x64-1.6.0/lib RUST_LOG=wasi_nn_onnx_wasmtime=info,wasmtime_onnx=info \
      ./target/release/wasmtime-onnx \
      tests/rust/target/wasm32-wasi/release/wasi-nn-rust.wasm \
      --cache cache.toml \
      --dir tests/testdata \
      --invoke batch_squeezenet \

The project exposes two Cargo features: tract, which is the default feature, and c_onnxruntime, which when enabled, will compile support for using the C API for the ONNX runtime. After building with this feature enabled, running the binary requires passing the path to the ONNX shared libraries, either as part of the PATH, or by setting the LD_LIBRARY_PATH.

Contributing

We welcome any contribution that adheres to our code of conduct. This project is experimental, and we are delighted you are interested in using or contributing to it! Please have a look at the issue queue and either comment on existing issues, or open new ones for bugs or questions. We are particularly looking for help in fixing the current known limitations, so please have a look at issues labeled with help wanted.

Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct.

For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

[wasmtime-impl]: https://github.com/bytecodealliance/wasmtime/tree/main/crates/wasi-nn

[sq]: https://github.com/onnx/models/tree/master/vision/classification/squeezenet