When used with ComponentizeJS output, `wasmtime.Module.from_file` is extremely slow and not configurable

whitequark commented 3 months ago

I have a really simple component with this WIT world:

package local:wavedrom;

world wavedrom {
  export render-json: func(json: string) -> string;
}

After running through ComponentizeJS, it results in a 14 MB .component.wasm file. This is essentially reasonable, I've used much bigger ones.

However, when I try using it with this equally simple wrapper:

import json
import wasmtime
from . import _bindings

def render(source):
    store = wasmtime.Store()
    return _bindings.Root(store).render_json(store, json.dumps(source))

I discover that the wasmtime.Module.from_file call takes almost 4 seconds to execute on my machine (!). This is actually slow enough that I will not be able to use the component in the intended application, and will have to embed a JS engine in some other way.

In YoWASP/runtime-py I have a bunch of code that caches the compiled executable code, and it works extremely well for other YoWASP tools. However, the generated bindings offer me no way to inject this code.

What do I do?

whitequark commented 3 months ago

It looks like using a single global store lowers the render call overhead to 18 ms (essentially too low to be properly measurable), but:

It's not clear to me that a store can be used in such a way. What if two Python threads try to run render_json concurrently?
An unconditional four-second delay on startup or first use (whether per-process or per-thread) is still unacceptably high for my application.

For context, this library will be embedded in a Sphinx builder process, and this delay will become the majority of both hot-rebuild and cold-rebuild time. I'm trying to use Wasm to replace another flawed attempt to use WaveDrom with Sphinx which has a similarly unacceptably high latency, and it looks like the Wasm component version is actually slower, which surprised me.

whitequark commented 3 months ago

So, it turns out I completely forgot about the wasmtime.Store(engine=) argument, which allows me to write code like this:

import json
import wasmtime
from . import _bindings

def _instantiate():
    if not hasattr(_instantiate, "initialized"):
        config = wasmtime.Config()
        config.cache = True
        _instantiate.store = wasmtime.Store(wasmtime.Engine(config))
        _instantiate.component = _bindings.Root(_instantiate.store)
        _instantiate.initialized = True
    return _instantiate.store, _instantiate.component

def render(source):
    store, component = _instantiate()
    return component.render_json(store, json.dumps(source))

You can see that I have to kind of work around both the issues raised above and https://github.com/bytecodealliance/wasmtime-py/issues/218. Is this a recommended pattern? Should I ship this in a production library?

alexcrichton commented 3 months ago

Thanks for the report! I might recommend your last comment as the workaround for now, but I think it might also be reasonable to bake some more of this into the bindings generator itself. For example the bindings generator could have a mode that automatically instantiated-on-first-use and would configure a single global instance like you're doing. There could perhaps also be a flag to enable caching by default (and/or a more global Engine could also be accessed as well). Basically I don't disagree that this isn't so great right now, and I think it'd be quite reasonable to add some more bindgen options!

What if two Python threads try to run render_json concurrently?

This is correct that concurrent usage is not allowed. You'd need a store-per-thread. Right now wasmtime-py provides no protections against this, so getting it "wrong" will probably result in a segfault. (I'm a newb in threading in Python, so if you've got suggestions about how to improve this I think it'd be great to improve this)

Should I ship this in a production library?

In theory the binding generation for components is "pretty good", but it was done by me as an initial sort of proof-of-concept to make sure it was possible to do this all in Python. It has not been super heavily tested, however, and there's a fair amount of Python-specific code here (e.g. code that's not just otherwise tested by Wasmtime's main test suite). In that sense if you're looking for something stable and unchanging with very few bugs I might not recommend using it, but if you're ok with the occasional bug and the interface perhaps changing over time I think it's probably reasonable.

Although if you're asking for that specific function I don't see anything wrong with that myself modulo the threading concerns (which are already preexisting with Wasmtime today)

whitequark commented 3 months ago

so getting it "wrong" will probably result in a segfault. (I'm a newb in threading in Python, so if you've got suggestions about how to improve this I think it'd be great to improve this)

Although I understand how we got here, I would describe this as "unsound". In general any Python library that terminates the interpreter in any way, even if a precondition is violated, is not well-behaved.

However, there is an exception of sorts for foreign libraries which rely on assertions, since it's often completely impractical to convert those traps into exceptions. (It does still result in an awful developer experience, for example anything that binds LLVM is painful enough to use it's motivating Python, OCaml, etc developers to jump through a lot of hoops to avoid binding LLVM-C, but enough libraries do this that at least the behavior wouldn't be totally unexpected.) I would still attempt to prevent as many assertions as feasible, e.g. the case of mismatched store is something I would consider a bug if it crashes the interpreter.

However#2, I think segfaulting on races is not one of those exceptions and it would be rather unexpected by a typical Python developer, especially one with tangential familiarity with Rust and the thread safety that it provides.

I think stores should probably remember which threads they're attached to and refuse to run on others, or else use mutexes. The former seems vastly cheaper and a constraint of "you have to use the store on the same thread you created it on" would be a little unusual for a Python library but not at all difficult to understand or follow. Though the latter would certainly also work, and I think uncontended mutexes can be really cheap these days, so perhaps that's just OK too?

Although if you're asking for that specific function I don't see anything wrong with that myself modulo the threading concerns (which are already preexisting with Wasmtime today)

Yeah pretty much this. I'll update this thread when I fix the threading (no pun intended) so that others can reuse my code in the interim.

whitequark commented 3 months ago

It looks like the thread-safe variation isn't that much worse:

import json
import wasmtime
import threading
from . import _bindings

_wasm_state = threading.local()

def _instantiate():
    if not hasattr(_wasm_state, "initialized"):
        config = wasmtime.Config()
        config.cache = True
        _wasm_state.store = wasmtime.Store(wasmtime.Engine(config))
        _wasm_state.component = _bindings.Root(_wasm_state.store)
        _wasm_state.initialized = True
    return _wasm_state.store, _wasm_state.component

def render(source):
    store, component = _instantiate()
    return component.render_json(store, json.dumps(source))

alexcrichton commented 3 months ago

Everything you say makes sense to me, and if I could perhaps summarize to make sure I captured everything:

This library should be "thread safe" by default insofar that it doesn't crash anything. I like your idea of capturing the thread ID when a Store is created and asserting from then on it's only used on that thread.
Rust panics should becomes Python Exceptions in theory, and probably the only practical case this comes up today is when Stores are mis-matched. This should be a Python Exception rather than a panic.

That all seems quite reasonable to me 👍

whitequark commented 3 months ago

2. Rust panics should becomes Python Exceptions in theory, and probably the only practical case this comes up today is when Stores are mis-matched. This should be a Python Exception rather than a panic.

Ah yeah I completely forgot that we have panic-safe and therefore assert-safe code in Rust. It's not an option e.g. one has binding LLVM ^^; Yeah, this is clearly preferable.

bytecodealliance / wasmtime-py

When used with ComponentizeJS output, `wasmtime.Module.from_file` is extremely slow and not configurable #217