bytecodealliance / lucet

Lucet, the Sandboxing WebAssembly Compiler.
Apache License 2.0
4.06k stars 164 forks source link

Refactor run_async to use custom Future implementation #626

Closed benaubin closed 3 years ago

benaubin commented 3 years ago

This PR replaces the async fn run_async_internal() with a custom implementation of Future.

Advantages:

Semantically, guests behave exactly as normal async functions would, where block_on is equivalent to calling await.

It's now possible to declare an async hostcall like an async function:

#[lucet_hostcall]
#[no_mangle]
pub async fn hostcall_async(vmctx: &Vmctx) -> u32 {
    for i in 0..6 {
        YieldingFuture { times: 2 }.await
    }

    return 0;
}

I also a added new try_block_on method, which allows an embedded to define fallback behavior for when an instance is not run from within an async execution context.

Additionally, I removed the inst_count_bound argument in run_async and run_async_start (added in #612, API change is not released) and replaced it with a method and field on the RunAsync future. This is more flexible for users that need the ability to configure a CPU bound (it lets a user change the CPU bound in-between any poll) while being less verbose (and backward compatible with the old api) for users without that need.

benaubin commented 3 years ago

In case it's useful, here's one way to implement a soft CPU time limit with this refactor. I'm not sure if it's worth adding an extra dependency to lucet, so I'm not including it in the PR (I'd be happy to, though!).

use std::time::Duration;
use cpu_time;

pub struct RunAsyncWithCPULimit<'a> {
    run_async: RunAsync<'a>,
    pub cpu_time_limit: Duration,
    pub cpu_time_used: Duration,
}

struct CPULimitExceeded;

impl<'a> Future for RunAsyncWithCPULimit<'a> {
    type Output = (
        Result<Result<UntypedRetVal, Error>, CPULimitExceeded>,
        Duration,
    );

    fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        let start = cpu_time::ThreadTime::now();

        let poll = Pin::new(&mut self.run_async).poll(cx);

        self.cpu_time_used += start.elapsed();

       match poll {
           Poll::Ready(res) => Poll::Ready((Ok(res), self.cpu_time_used)),
           Poll::Pending if self.cpu_time_used > self.cpu_time_limit => {
               Poll::Ready((Err(CPULimitExceeded), self.cpu_time_used))
           }
           Poll::Pending => Poll::Pending
       }
    }
}

Feel free to use this example however you'd like (I'm hereby releasing this example under the CC-0 license)

benaubin commented 3 years ago

It might help me to understand the need for these if you could summarize why one can't implement the wall-clock bound example above by wrapping the future returned by the current API. For example, I think one could write an add_timeout(inner: F) -> TimeoutFuture and define TimeoutFuture to store an F, without requiring a named future-state struct on the Lucet side. This is largely like your example above, but with the struct type param to make it work without Lucet changes. Is there a reason this wouldn't work?

To be honest, I completely forgot that you could use generics for this. However, futures created by async/await are Unpin - so it would be more difficult to write (although theoretically possible).

The primary thing that is possible to do with this API but not an async fn is adjusting the instruction bound between polls.

Additionally, the custom Future implementation reduces major sharp edges by making it much more clear how the lucet and the future executor interact.

In the future, this API makes other features possible:

benaubin commented 3 years ago

@cfallin Hey Chris! I just pushed a new commit that takes advantage of the custom Future implementation to do the first poll of a future without yielding. For futures that are immediately ready, this implementation requires no context switch and no heap allocations - similar to using .await.

cfallin commented 3 years ago

@benaubin Sorry, I haven't gotten back to this as I've been somewhat short on time. To be honest, I still don't quite understand some of the details and reasoning here. For example, could you explain what you mean by: "takes advantage of the custom Future implementation to do the first poll of a future without yielding." -- first poll of which future? AFAICT in the current code, if the called Wasm function returns soon enough (before first bound expiration), we don't yield and require another poll, we just return a completed value. Or do you mean in some other situation?

From above, the only desired behavior I see that is not yet possible is to set the bound to a different amount on every poll. Everything else is possible with the current API. Is that correct?

Overall, I am hesitant to accept a full rewrite of the core async code, especially one that replaces an async fn with a custom future implementation and especially one that uses unsafe code, without a really really compelling reason. I'd be happy to look at a much smaller-scoped patch that allows for adjusting the bound on each poll, though, if you'd like. Sorry and thanks for the patience here!

benaubin commented 3 years ago

No problem!

The change I made yesterday makes it so there is no yield if the future passed to block_on is immediately ready. For example, a call like vmctx.block_on(async { 5 }), would return 5 without needing a yield to the host.

I'm currently working on reducing the amount of unsafe code necessary to make this work.

benaubin commented 3 years ago

@cfallin I just pushed a commit that removes nearly all of the unsafe code from run_async, including the transmute that the trampoline-style execution required. There's one unsafe block left related to run_async, which is solely used for pinning the future (which is required in order to poll).

benaubin commented 3 years ago

@cfallin Just pushed a new commit that adds support for defining an async hostcall just like an async function:

        #[lucet_hostcall]
        #[no_mangle]
        pub async fn hostcall_async(vmctx: &Vmctx, times: u32, times_double: u32) -> u32 {
            for i in 0..times {
                YieldingFuture { times: 2 }.await
            }

            return times * 2;
        }

Like Rust's .await, there is no overhead for a future that is immediately ready, and all execution of the future takes place from within the guest. To make this work, if the future passed to block_on returns Poll::Pending, block_on yields to RunAsync, which itself returns Poll::Pending.

Later, the async executor polls the RunAsync future, which resumes the guest, which polls the future from within the guest execution context.

benaubin commented 3 years ago

Rebased onto main, consolidated commits (happy to squash into a single commit if requested, but I'm separating out a few of them in case some of the changes are rejected).

@cfallin the changes should be ready for review. I know that my original changes weren't massive improvements, but these more recent ones much larger advantages, while nearly-eliminating the unsafe code in run_async:

Semantically, guests behave exactly as normal async functions would, where block_on is equivalent to calling await.

benaubin commented 3 years ago

Awesome! Thanks, @cfallin. I'll make the changes tomorrow.

pchickey commented 3 years ago

Thanks for working on this! Many of the futures bits of this are over my head, but the change to the macro to accept async fn is welcome. I had planned to do something just like that for compatibility with wasmtime :)

benaubin commented 3 years ago

The changes have been made and the test for yielding from async has been added!

pchickey commented 3 years ago

I am not aware of a workaround to the CircleCI problem, unfortunately - I can push a branch and once it goes green i'll link to the status here and we can merge this PR.

benaubin commented 3 years ago

Alternatively, I could edit this PR to merge into a non-main branch, then someone could create a new PR to test on Circle?

pchickey commented 3 years ago

https://app.circleci.com/pipelines/github/bytecodealliance/lucet/430/workflows/36261c81-0205-41df-8626-2479a9261e76

pchickey commented 3 years ago

you'll need to merge the latest main to fix the cargo audit error

benaubin commented 3 years ago

@pchickey Done.

benaubin commented 3 years ago

@alexcrichton It's indeed extremely similar to your wasmtime PR. Lucet runs every instance within a context switch, so it's very similar to wasmtime's fibers for async instances.

The primary difference in logic is that, in this PR, I avoid having to transmute the std::task::Context by cloning the Waker and creating a new context from inside the instance. This is partly because an instance could yield and last for longer than the lifetime of the Context (although it should be possible to use a stricter lifetime bound to soundly avoid the clone of context, it's probably not worth it).

alexcrichton commented 3 years ago

Ah ok that makes sense. While I believe that works well today the intention of Context was to abstract over possible extensions in the future, so in the limit of time that may not be an equivalent operation. For example std may one day expand the abilities of Context with new functionality, and certain features of tokio may take advantage of that, but that means that hostcalls using tokio futures may unexpectedly not have access to these features since we're recreating Context.

I don't think that there's any plans at this time to actually do this so it's a pretty meta-level concern though.

benaubin commented 3 years ago

@alexcrichton That's what I figured. My original implementation passed the context to the instance with a transmute, but also required a lot of unsafe code blocks propagated throughout lucet.

It wouldn't be too hard to change this if necessary, although to do so soundly, RunAsync would need to prevent "yielding" (Lucet instances can yield without blocking on a future and later be resumed from the host). Prior to this PR, that was the case - instances were not allowed to yield when in an async context. After switching the implementation to clone a waker rather than transmute the lifetime of Context, I also relaxed the restriction on yielding from an async context.

For future compatibility's sake, it might be worth reinstating the restriction, or somehow marking yielding as "unstable".

alexcrichton commented 3 years ago

I'm not really sure what it means to prevent yielding or be in or not be an async context myself, but I think it's a reasonable state of affairs to just create a new Context for now to prevent a lot of unsafe code. I suspect that any change to Context is years out at the earliest, if ever, so we'd have plenty of runway to update things if needed.