bytecodealliance / wasmtime

A fast and secure runtime for WebAssembly
https://wasmtime.dev/
Apache License 2.0
14.81k stars 1.24k forks source link

Feedback on some changes between wasmtime 17 and 21 #8845

Open vigoo opened 1 week ago

vigoo commented 1 week ago

Hi! In the past few days I've updated Golem (https://github.com/golemcloud/golem/pull/602) from wasmtime 17.0 to 21.0.1 (including some other dependencies and all the related tooling versions).

It went a bit harder than I expected so I thought I open this ticket as an attempt to provide constructive feedback from the point of view of someone embedding wamtime. I found solutions and workarounds to all the problems I ran into so there is no specific need for any change or help, this is just an observation of the effects of some recent changes.

Context

Before describing the issues I ran into, a quick overview of how we are using wasmtime's WASI implementation. Golem uses the wasmtime-wasi crate but wraps all the host functions to implement durable execution. (We are using a fork of wasmtime but it only has minor patches, mostly enabling async bindings for more WASI host functions because we need that for our wrappers)

So we have something like a

struct DurableWorkerCtx<Ctx: WorkerCtx> {
  wasi: WasiCtx,
  wasi_http: WasiHttpCtx,
  table: ResourceTable,
  // ...
}

Here Ctx is the actual type used in Linker, Store etc, why it's separate from this DurableWorkerCtx type is irrelevant.

Our WASI wrappers are host trait implementations on DurableWorkerCtx which are under the hood calling wasmtime's implementations:

pub struct DurableWorkerCtxWasiView<'a, Ctx: WorkerCtx>(&'a mut DurableWorkerCtx<Ctx>);
impl<'a, Ctx: WorkerCtx> WasiView for DurableWorkerCtxWasiView<'a, Ctx> { ... }
impl<Ctx: WorkerCtx> Host for DurableWorkerCtx<Ctx> { 
  async fn get_environment(&mut self) -> anyhow::Result<Vec<(String, String)>> {
    // ...
    Host::get_environment(&mut ctx.as_wasi_view()).await
  }
  // ...
}

All these wrappers were registered into the linker with a generic function that looked something like this:

pub fn create_linker<T, U>(
    engine: &Engine,
    get: impl Fn(&mut T) -> &mut U + Send + Sync + Copy + 'static,
) -> wasmtime::Result<Linker<T>>
where
    T: Send,
    U: Send + wasi::cli::environment::Host + // ...  
{
    let mut linker = Linker::new(engine); 
  wasmtime_wasi::preview2::bindings::cli::environment::add_to_linker(&mut linker, get)?;
  // ...
}

This is called with a closure that just returns a &mut DurableWorkerCtx<Ctx> from Ctx, which implements all the host interfaces.

This all worked well with wasmtime 17, so let's see what were the changes that made me open this ticket!

Dropping Sync constraints

The first change causing trouble was https://github.com/bytecodealliance/wasmtime/pull/7802

Here by dropping Sync from WasiCtx and ResourceTable meant that we can no longer keep them as simple fields in ourDurableWorkerCtx and had to apply the Arc<Mutex<...>> "trick" which is also used in the above PR with the comment that "at least it is not blocking". While this of course works it feels quite hacky and I wanted to point out that I believe anybody who is wrapping (some of) the WASI host implementations and using wasmtime in async mode will run into the same problem and need to apply create these 'fake' mutexes.

The GetHost refactoring

The second thing that took many hours for me to figure out is https://github.com/bytecodealliance/wasmtime/pull/8448 which is already part of wasmtime 21 but as I understand just a part of all the planned changes (https://github.com/bytecodealliance/wasmtime/issues/8382). Maybe all the difficulties are only caused by this intermediate state, but the combination of

took hours to just understand what exactly the problem is, and then required me to write hundreds/thousands of lines of boilerplate to workaround it. Let me explain.

As add_to_linker is gone, I had found there is a add_to_linker_get_host and naively rewrote the above create_linker function to use that. That eventually started to fail with an error for missing a WasiView constraint on the output type parameter which was very confusing, as none of the types involved, and nothing in the code generated by bindgen! contains anything related WasiView. The reason for it was how GetHost is defined, it now fixes the O to be the same as the result type of the closure it derives from, so in our case it was no longer looking for Host implementations on DurableWorkerCtx<Ctx> but &mut DurableWorkerCtx<Ctx>. As this was not possible the next thing the compiler found was the

impl<T: ?Sized + WasiView> WasiView for &mut T { ... }

implementation in wasmtime_wasi which lead to the weird errors mentioning WasiView (finding all the Host implementations requiring T: WasiView through this).

After understanding all this I realised that in this intermediate state where wasmtime_wasi sets skip_mut_forwarding_impls and uses the above trick to implement the type classes through &mut T the only thing I can do is to manually implement these "forwarding trait implementations" for all the WASI host functions:

So I ended up manually writing wrappers for all our wrappers like this:

impl<'a, Ctx: WorkerCtx> Host for &'a mut DurableWorkerCtx<Ctx> {
    async fn get_environment(&mut self) -> anyhow::Result<Vec<(String, String)>> {
        (*self).get_environment().await
    }
    // ...
}

With these wrappers finally the GetHost implementations started to work but I still had to drop the generality of our create_linker function where it previously worked with anything implementing the required set of host functions, now it only works with DurableWorkerCtx<Ctx>. This is not an issue for us right now, and may be just my limited Rust experience, but that's the best I could came up with, something like

pub fn create_linker<Ctx: WorkerCtx + Send + Sync, F>(
    engine: &Engine,
    get: F,
) -> wasmtime::Result<Linker<Ctx>>
where
    F: for<'a> Fn(&'a mut Ctx) -> &'a mut DurableWorkerCtx<Ctx> + Send,
    F: Copy + Send + Sync + 'static,
{ 
    let mut linker = Linker::new(engine);
  wasmtime_wasi::bindings::cli::environment::add_to_linker_get_host(&mut linker, get)?;
  // ...
}   

Conclusion

By showing these two examples I wanted to demonstrate what problems I ran into while upgrading to the latest version of wasmtime, with the only purpose of providing some feedback and examples of how we are using it.

Thank you for the awesome wasm engine!

alexcrichton commented 1 week ago

Wanted to drop a comment here saying thank you for taking the time to write this all down! Integration with WASI is definitely one of our weak points API-design-wise and is something I've long wanted to improve. I'll try to provide some more detailed feedback later today or in the coming days (and ideally fully address some of these pain points too)