bytecodealliance / wasmtime

A fast and secure runtime for WebAssembly
https://wasmtime.dev/
Apache License 2.0
15.1k stars 1.26k forks source link

Ability for Epoch Cancelation to Cancel Imported WASI Functions #9188

Open KennanHunter opened 2 weeks ago

KennanHunter commented 2 weeks ago

I'm still new to this codebase and may not have a perfect understanding or description of how the engine handles imported functions or epoch canceling. Any feedback is appreciated.

Improvement

As it stands, epoch cancellation is checked in two spots, when entering a Wasm function, and at the top of Wasm loops. This falls flat when using imported functions such as from WASI as if the epoch is incremented while inside said function, the program will only trap when the function is completed. My request is to find some way for these imported functions to respond to epoch updates.

Benefit

It is important for Wasmtime consumers to be able to define some form of timeout, especially when executing untrusted code. One example where the current epoch timer system fails to clean up code on time is in the following C code.

I compiled this with /opt/wasi-sdk/bin/clang ./main.c -o app.wasm --target=wasm32-wasip2 -v on WASI-SDK version 24.0.

#include <stdio.h>
#include <time.h>

#define TIMER_RELATIVE_FLAG 0

int main()
{
    struct timespec deadline;

    deadline.tv_sec += 10;

    printf("C code set to sleep for %lld s\n", deadline.tv_sec);

    clock_nanosleep(CLOCK_MONOTONIC, TIMER_RELATIVE_FLAG, &deadline, NULL);

    printf("Sleep completed\n");

    return 0;
}

The epoch timer is checked at the beginning of the clock_nanosleep function and the beginning of the printf, but the clock_nanosleep function blocks the VM for the full 10-second runtime, even if the epoch timeout is reached in the middle of its evaluation.

Implementation

As more of a band-aid solution, is it possible to change the WASI function implementations themselves to be cancelable? This particular example relies upon the WASI-IO poll function, which itself utilizes Tokio futures. Tokio has several ways to, if not cancel, at least return early and discard the result, such as tokio::timeout.

Is there a way to change the implementation of epoch-interruption to be more robust, in the sense that it could cause a trap outside of a specific location in the Wasm code?

Alternatives

While there are other methods, such as the fuel system, to cancel code execution, there isn't a robust way to ensure the engine kills a program after a certain amount of time.

alexcrichton commented 2 weeks ago

Thanks for the report! This is a good reminder to me at least that our documentation of these configuration options doesn't cover this case so I'll look to expand that. Otherwise though it's intentional that epochs/fuel only work at the wasm-is-running level. Interruption/timeout of host code is left to the embedder and is not a feature provided by Wasmtime (as we can't cancel arbitrary code in Rust).

The primary way to cancel host code is to use async for host imports which may block for a long period of time. That means that a suspended computation blocked on something is "just" a future which can be dropped at any time to cancel it. This is the solution pursued by https://github.com/bytecodealliance/wasmtime/pull/9184 which switch the CLI to using async for host imports instead of sync and when coupled with tokio::time::timeout has the desired semantics of cancelling the computation even if it's sleeping. Other embedders will need to be sure they're using async imports and not actually blocking the host thread for similar behavior.

Were you interested in primarily the CLI for this issue? Or for other embeddings as well? It's a known issue, for example, that async isn't easily usable in other-language embeddings through the C API (technically possible, but not fully realized in all bindings yet)