bytecodealliance / wasmtime

A fast and secure runtime for WebAssembly
https://wasmtime.dev/
Apache License 2.0
15.46k stars 1.31k forks source link

Idea: put VM data structures in arena with relative pointers for security and strict provenance #9060

Open cfallin opened 3 months ago

cfallin commented 3 months ago

As part of the discussion on #9015 / #9026, we discussed handling of VM data structures -- the vmctx struct, tables, function references, and the like -- that are touched both by runtime code (in Rust) and by generated code compiled from the Wasm. There are issues related to strict pointer provenance because pointers to these data structures are exposed to the generated code, and/or the Pulley interpreter, without strict provenance (either through the Pulley bytecode, or through the machine code we invoke that is entirely outside of the domain of Rust's semantics).

It occurs to me that one way to solve this would be to make all VM data structures use relative pointers -- e.g., u32 offsets -- in an arena (per store? per engine?) whose base pointer is a parameter both to the generated code and to the Pulley interpreter. We then trivially have strict provenance because there is only one pointer -- and whatever we need to do to preserve provenance (keep it as a pointer in the Pulley interpreter loop; and "expose" it as we pass it to generated code) is localized and manageable.

If provenance were the only benefit, that may not be so interesting; but there are a few others as well:

So it seems we can get (i) fully strict provenance, (ii) better safety, (iii) other interesting new abstractions like whole-engine snapshotting, if we pay this cost. Something to consider later if any of these needs becomes interesting?

alexcrichton commented 3 months ago

How would this be reflected in CLIF? All loads/stores would have to be derived from a small set of "base pointers" such as the base of each linear memory in a module and the arena holding vmctx/tables/etc. For example Pulley might have a load-from-memory-zero instruction or load-from-arena instruction, but how would it know which to choose from a CLIF load?

(and possibly another area for "stack memory of this function", so how to detect loads/stores to the stack)

cfallin commented 3 months ago

There are at least two ways I think:

I kind of like the first more -- it's less intrusive to Cranelift (in a way that avoids complexity-footguns around e.g. alias analysis with separate address spaces) at the cost of a little more constraint on memory layout (but then we're already saying we're going to put everything in some arena).

cfallin commented 3 months ago

(For clarity on Option 1: the base pointer is implicit and affects all loads/stores; we wouldn't add an extra argument to load/store instructions.)