Open VivekPanyam opened 1 year ago
FWIW, I recently pushed ort
v1.15.5 which adds support for WASM.
Oh wow, that's great! I see you landed https://github.com/pykeio/ort/commit/092907a5d2ddd7ec4c9340d2cddc04cb293e003d a bit after I created this issue :)
I'm working on getting the GPT2 example working from WASM and I'll comment with how it goes!
Is there a WebGPU or WebGL execution provider btw?
The ONNX Runtime website says:
you have the option to use
webgl
orwebgpu
for GPU processing, and WebAssembly (wasm
, alias tocpu
) for CPU processing. All ONNX operators are supported by WASM but only a subset are currently supported by WebGL and WebGPU.
you have the option to use
webgl
orwebgpu
for GPU processing, and WebAssembly (wasm
, alias tocpu
) for CPU processing. All ONNX operators are supported by WASM but only a subset are currently supported by WebGL and WebGPU.
I couldn't find any documentation on how to actually use either backend. I think it may be automatically available just by compiling with --use_jsep
but I'm not sure. I'll keep looking into it.
Thanks! It looks like a WASM build with 1.15.5
fails:
close_lib_handle
is not defined for WASMhttps://github.com/pykeio/ort/blob/bca00dc96d8e6fd047fa44ebd5c5287517ed0af1/src/session.rs#L759-L767
std::os::unix::ffi::OsStrExt;
doesn't exist on WASMhttps://github.com/pykeio/ort/blob/bca00dc96d8e6fd047fa44ebd5c5287517ed0af1/src/session.rs#L5-L6
I am using wasm32-unknown-unknown
though. I don't believe using wasm32-wasi
would fix it, but I noticed you're building using emscripten. Does that work?
I got a simple MNIST test working on wasm32-unknown-emscripten
. Since ONNX Runtime itself is compiling with Emscripten I don't believe it would work on wasm32-unknown-unknown
either way.
@decahedron1 Could post your test code somewhere please?
The emscripten thing makes sense. Even if we compiled the rest of the code without emscripten, we'd still need all the emscripten runtime components to actually make the ONNX Runtime itself work
@katopz I know we spoke in https://github.com/VivekPanyam/carton/issues/159#issuecomment-1740248225 about you exploring wonnx
and WASM, but would you be open to trying to get this working with ort
?
Ideally, we'd first test that ort
works from WASM (with and without WebGPU) and then we can build a basic ONNX runner that supports Linux, macOS and WASM.
@VivekPanyam Certainly: https://github.com/decahedron1/carton-ort-wasm-example
It seems like WebGPU support with Microsoft ONNX Runtime would be much more difficult than I was anticipating - you'd have to somehow include their JavaScript code (slightly more info in the PR - https://github.com/microsoft/onnxruntime/pull/14579) and connect it to the proper places, which I'm not sure is even possible with --build_wasm_static_lib
, so wonnx
might be worth exploring for GPU acceleration on web.
Thank you! I'll check it out.
Okay, so then I think we have a few potential solutions:
1. wonnx
on platforms with WebGPU available and ort
everywhere else.
Straightforward, but could cause issues if a model works with ort
, but fails with wonnx
(or vice versa).
2. Use wonnx
everywhere (if it can also run without GPUs).
This provides a consistent user experience.
I think we'd need to explore inference performance and supported operators vs ort
if we decide to look into the second approach.
3. Integrate all three runtimes into a single runner
Another approach is to integrate all three runtimes into a single runner and allow users to do the following:
I think there might be a way to do option 3 in a way where it has a clean user experience, but we'd have to be careful about the default logic. I think it would be confusing to users/could break things if we changed the default implementation selection logic after the runner was released.
Maybe a hybrid of 1 and 3 would work and users can decide to use WebGPU or not at inference time.
I think we should start by implementing a runner that uses ort
everywhere it's supported. We can then add in WebGPU support with wonnx
and make it an explicit opt-in at inference time.
So it'll always use ort
(and the "official" ONNX Runtime) unless you explicitly tell it to use wonnx
with WebGPU.
And if we want to, there's nothing stopping us from extending that to tract
. ort
is always the default and everything else is an explicit opt-in.
Thoughts?
Also @decahedron1, would you be open to building/helping build a runner for Carton that uses ort
?
If so, @katopz could continue exploring wonnx
Will do, wonnx
and ort
is on my waiting list. Anyway yesterday I try explore/build/compile native/wasm examples from https://github.com/huggingface/candle (yes i still evaluate things here) I like to know what your thought on candle approach?
@VivekPanyam I generally agree with your assessment. wonnx
is an option if you are looking for a relatively lightweight (and Rust native) way to run ONNX models on GPU. In essence wonnx
translates ONNX models to wgsl
shaders and executes these using wgpu
on the GPU.
I have no experience with CPU-based implementations of WebGPU, apart from the fact that we use it in CI to run some tests. wonnx
can't run on the web in WASM if the browser does not offer (or has disabled) WebGPU support. On the web, wgpu
is merely a passthrough layer to the underlying browser-implemented WebGPU API (which in Firefox again is based on wgpu
by the way!).
An important thing to consider is the support for ops, which differs between the engines. wonnx
certainly does not support all ops. Additionally wonnx
works using 'ahead of time' compilation of shaders (which means all shapes need to be known in advance - there is shape inference functionality for this), and because of this certain ops with dynamic shapes are not supported and will be very hard to support in the future.
@pixelspark That makes sense. So explicit opt-in is probably a safe bet (as long as we can design that in a way that isn't confusing to users).
@pixelspark @decahedron1 Thank you both for taking the time to provide your thoughts!
Will do,
wonnx
andort
is on my waiting list. Anyway yesterday I try explore/build/compile native/wasm examples from https://github.com/huggingface/candle (yes i still evaluate things here) I like to know what your thought on candle approach?
@katopz see #164
In general, please try to keep issues focused on their original topic. For more open ended conversations, consider creating a discussion. Thanks!
@katopz do you want to build an ONNX runner using ort
(and then we can add wonnx
and WebGPU support once the runner is working)?
Sorry, to say but not real soon because
In the meantime you can assign that task to anyone.
@VivekPanyam I also agree with your assessment on using pykeio/ort by default and having wonnx
as an explicit opt-in. There can be a case made for having wonnx
as default with WebGPU pending performance seen in experiments comparing the two Rust ONNX wrappers.
I also agree with @pixelspark's comment on considering support for ops. It’s more than reasonable to assume ONNX Runtime supports all operator kernels. With contrib and custom ops there seems to be support, but I'd be careful starting out. At the moment pykeio/ort seems to support ONNX v.1.15.1 whereas ONNX Runtime's latest version is v1.16.0, so for a given custom op it's worth verifying its support in pykeio/ort first.
@katopz I'm sorry to hear that. I hope things work out in a way you'd like them to.
@mstfbl Makes sense, thanks!
If anyone is interested in implementing a runner with ort
, feel free to comment below :)
There are many different ways of running an ONNX model from Rust:
tract
"Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference".
Notes:
wonnx
"A WebGPU-accelerated ONNX inference run-time written 100% in Rust, ready for native and the web"
Notes:
wgpu
supports Vulkan and there are software implementations of it (e.g. SwiftShader), but not sure how plug-and-play it is.ort
"A Rust wrapper for ONNX Runtime"
Notes:
If we're going to have one "official" ONNX runner, it should probably use
ort
. Unfortunately, sinceort
doesn't have WASM support, we need another solution for running from WASM environments.This could be:
ort
on desktop,tract
on WASM without GPU, andwonnx
on WASM with GPUs. This seems like a complex solution especially because they don't all support the same set of ONNX operators.tract
everywhere, but don't have GPU supportwonnx
everywhere, but require GPU/WebGPU@kali @pixelspark @decahedron1 If you get a chance, I'd really appreciate any thoughts you have on the above. Thank you!