bytecodealliance / wasmtime-py

Python WebAssembly runtime powered by Wasmtime
https://bytecodealliance.github.io/wasmtime-py/
Apache License 2.0
387 stars 51 forks source link

Handling a composed component that exchanges strings #143

Open Finfalter opened 1 year ago

Finfalter commented 1 year ago

As discussed in Zulip > general > composing components, wasmtime-py doesn't yet support handling a composed component that exchanges strings. Since I really would like to use this feature in one of my projects, I raise this issue in the sense of a feature request. A minimal example of what is expected together with an illustration of what error is raised can be found here.

For illustration: trying to compose component1 and component2 reflecting the following two interfaces Interface of component1

interface exports {
  greet: func(s: string) -> string
}

default world greetworld {
  export greeting: self.exports
}

Interface of component2

interface imports {
  greet: func(s: string) -> string
}

interface exports {
  greet: func(s: string) -> string
}

default world bettergreetworld {
  import greeting: self.imports
  export exports: self.exports
}

yields the following error

#[..]
Caused by:
    wasm trap: wasm `unreachable` instruction executed
alexcrichton commented 1 year ago

Thanks for the report! I won't personally have the chance to get to this for a bit, so I'm going to write down some notes here. This shouldn't be the trickiest thing in the world if something is feeling particularly intrepid to take this on, but it's also noat necessarily a great first-task either. I can try to help out along the way with questions if someone's interested though!

What's happening here is that this assertion is being tripped. This construct is indicating that a core wasm function needs to be synthesized to transcode strings from one component to another. This involves reading the string from one linear memory, validating its encoding, and then reencoding it into a destination linear memory. The specifics of this operation are well-defined but subtle as well because the encodings on both halves may be different, for example utf-8 and utf-16.

The GlobalInitializer comes from here in wasmtime and the Transcoder struct looks like this:

pub struct Transcoder {
    /// The index of the transcoder being defined and initialized.
    ///
    /// This indicates which `VMCallerCheckedFuncRef` slot is written to in a
    /// `VMComponentContext`.
    pub index: RuntimeTranscoderIndex,
    /// The transcoding operation being performed.
    pub op: Transcode,
    /// The linear memory that the string is being read from.
    pub from: RuntimeMemoryIndex,
    /// Whether or not the source linear memory is 64-bit or not.
    pub from64: bool,
    /// The linear memory that the string is being written to.
    pub to: RuntimeMemoryIndex,
    /// Whether or not the destination linear memory is 64-bit or not.
    pub to64: bool,
    /// The wasm signature of the cranelift-generated trampoline.
    pub signature: SignatureIndex,
}

This Transcoder structure represents a Python function that needs to be generating. The python function would be named something like _transcoder_i where i is the index field. The op field looks like this:

pub enum Transcode {
    Copy(FixedEncoding),
    Latin1ToUtf16,
    Latin1ToUtf8,
    Utf16ToCompactProbablyUtf16,
    Utf16ToCompactUtf16,
    Utf16ToLatin1,
    Utf16ToUtf8,
    Utf8ToCompactUtf16,
    Utf8ToLatin1,
    Utf8ToUtf16,
}

pub enum FixedEncoding {
    Utf8,
    Utf16,
    Latin1,
}

which describes the transcoding operation being performed. Each variant here requires a different Python function to implement it. At a high level all these algorithms are defined in this document and represents sort of a fused load_string and store_string function. Before I go too much more into this though the other fields of Transcoder are:

So the main meat of this is the op and the transcoding op. The two linear memories then describe where to read data and where to write data. Each op can have its own signature, not all transcoders ascribe to the same signature. A description of the signature of each transcoding operation can be found here and the Rust implementation of all transcoders can be found in this file.

That's all somewhat abstract, though, and the full power here isn't necessarily required. For example the above component probably only needs utf8-to-utf8 which is relatively simple compared to other encodings. I'll go through that in a bit more detail, and the other op variants can be left as unimplemented!() for now too.

For utf8-to-utf8 the original source string is validated as correctly encoded and then it's memcpy'd to the destination string. The Rust implementation is here which is called from a bit of macro-soup to handle fiddly bits but the general signature is here and here (as the op will be Copy(Utf8)).

This means that the Python host function will look something like:

def transcoderN_impl(caller: wasmtime.Caller, from_ptr: int, from_len: int, to_ptr: int) -> None:
    from: wasmtime.Memory = self._core_memoryA
    to: wasmtime.Memory = self._core_memoryB

    from_bytes = from.read(caller, from_ptr, from_ptr + from_len)
    # assert that `from_bytes` is valid utf-8 in Python, I'm not actually sure how to do this
    to.write(caller, from_bytes, to_ptr)

transcoderN_ty = FuncType(...)
transcoderN = Func(store, transcoderN_ty, transcoderN_impl, access_caller = True)

And that... might be the majority of it? Worth testing for sure!

The final bit to fill out will be this one which is implemented similar to the Lowering branch as transcoder{i} is Func you're accessing. (or at least I'm pretty sure).

That's hopefully enough for someone who's interested to get started, but I can answer more questions as well!

Finfalter commented 1 year ago

"some notes" sounds like a quantum of understatement....thank you for this elaborated sketch! Sooo, I can see the metaphorical double shampooed unrolled red carpet right in front of me. Since this feature is definitely on my wishlist, I will have a look (starting in a couple of days). I guess, it will take a longer while to complete. If you do not mind, I will use this channel in case of further questions.

zifeo commented 1 year ago

@Finfalter I am also interested in this, let me know if I can help.