bytecodealliance / lucet

Lucet, the Sandboxing WebAssembly Compiler.
Apache License 2.0
4.07k stars 164 forks source link

Revert #626 (Refactor run_async custom Future): unsound #643

Closed pchickey closed 3 years ago

pchickey commented 3 years ago

This reverts commit bf24022cc5e9c7e26ae12dd54c16fdbb26a45ddd, reversing changes made to ddaf91e4875adfb4912be47f51620c6ffafd818d.

Unfortunately, in testing, we discovered that #626 has unsound interactions with asynchronous signal handling. We want to find a sound way to get these changes back in, but we need to revert the merge while we figure that out.

626 introduced a change to lucet_runtime_internals::State where a variant now contains an Arc:

    Running {
        async_context: Option<std::sync::Arc<AsyncContext>>,
    }

When a signal handler terminates an instance, https://github.com/BytecodeAlliance/lucet/blob/main/lucet-runtime/lucet-runtime-internals/src/instance/signals.rs#L310 will drop this Arc from a signal context, which will in turn call free on the underlying memory. If the system allocator is not signal safe (e.g. jemalloc) this can result in a segfault.

pchickey commented 3 years ago

cc @benaubin

benaubin commented 3 years ago

Uh oh! I bet it would be an easy fix to put the async context as a field on the instance instead of on the state. Would be happy to put together a fix :)

benaubin commented 3 years ago

I mean, good to know, at least. #644 Should fix it (required minimal changes), and I'd be happy to rebase upon this PR.

In terms of the out-of-gas mechanism, I believe that an effective solution may be allowing calling a user-defined hostcall for the "on_bound_expired"? I'm not sure about the implications of that from a soundness perspective, but I'd imagine doing so would enable things like "check if cpu time limit expired" without the overhead of a context switch.

acfoltzer commented 3 years ago

this is also a good time to just rip out the async signalling mechanism, the fuel-based timeout machinery is easy to keep safe by comparison but might need some thought first.

This is how I would favor fixing this in the long run, after the immediate revert to back out the unsoundness. It does mean that folks will have to use an async runtime to do timeouts, even if they're not doing any other async stuff. But that can be pretty lightweight between futures::executor::block_on() and futures-timer.

Of course, the current KillSwitch mechanism is more general and powerful than just a means to implement timeouts, so removing it could break those use cases. Are we aware of any such use cases? Is this worth a BCA RFC?

pchickey commented 3 years ago

There are several ways we can proceed here, lets take our time figuring out what the best path is over on #644