Idea: put VM data structures in arena with relative pointers for security and strict provenance

cfallin commented 3 months ago

As part of the discussion on #9015 / #9026, we discussed handling of VM data structures -- the vmctx struct, tables, function references, and the like -- that are touched both by runtime code (in Rust) and by generated code compiled from the Wasm. There are issues related to strict pointer provenance because pointers to these data structures are exposed to the generated code, and/or the Pulley interpreter, without strict provenance (either through the Pulley bytecode, or through the machine code we invoke that is entirely outside of the domain of Rust's semantics).

It occurs to me that one way to solve this would be to make all VM data structures use relative pointers -- e.g., u32 offsets -- in an arena (per store? per engine?) whose base pointer is a parameter both to the generated code and to the Pulley interpreter. We then trivially have strict provenance because there is only one pointer -- and whatever we need to do to preserve provenance (keep it as a pointer in the Pulley interpreter loop; and "expose" it as we pass it to generated code) is localized and manageable.

If provenance were the only benefit, that may not be so interesting; but there are a few others as well:

Security: maintaining more discipline around raw pointers, and carefully dereferencing offsets into an arena instead (for which we can reserve guard regions just as we do heaps today), is a layer of mitigation/defense-in-depth against engine bugs. For example, if a miscompile or a bug in the generated CLIF caused a pointer-type confusion in VM data structures today, one could plausibly find a memory-read gadget or control-flow escape more easily. We will still have some raw pointers -- the actual code pointer that we invoke in funcrefs, or the memory base address in an imported or owned memory descriptor -- but fewer of them is less exposure.

This angle is not new: WebKit has the Gigacage, I believe V8 has something similar, and I had suggested we build (and now we are building) our Wasm GC implementation in the same way with relative-offset-pointers; so it's a proven mitigation and the overhead seems to be minimal.

Note also that relative-pointer loads/stores can be implemented with fully safe code in the Pulley interpreter. We almost certainly still need unsafe code still for the Wasm heap dereferences and such (though, then again, maybe there's a way around that by either externalizing a table of alternative raw pointer bases, or putting memories in a large gigacage -- teracage? exacage? -- as well).
Performance: 32-bit relative-offset pointers are half the size of 64-bit raw machine pointers on 64-bit platforms; this is the basis for the performance gain seen with compressed oops (object pointers) in OpenJDK, and also with wasm32 vs. native 64-bit code in some pointer-heavy benchmarks. It's plausible that a 2x shrink on the size of large function tables (for example) might result in slightly better cache residency and performance. Then again carrying the base pointer and adding it has a slight cost (single-digit percentage by analogy to studies on Wasm heap strategies); so maybe overall neutral.
There may be some other interesting side-effects: for example, if we fully relativize the core Wasm VM data structures, and externalize the "raw pointers" to a table as noted above, it would mean we could snapshot the entire VM state (at the VM level, not the Wasm level).

So it seems we can get (i) fully strict provenance, (ii) better safety, (iii) other interesting new abstractions like whole-engine snapshotting, if we pay this cost. Something to consider later if any of these needs becomes interesting?

alexcrichton commented 3 months ago

How would this be reflected in CLIF? All loads/stores would have to be derived from a small set of "base pointers" such as the base of each linear memory in a module and the arena holding vmctx/tables/etc. For example Pulley might have a load-from-memory-zero instruction or load-from-arena instruction, but how would it know which to choose from a CLIF load?

(and possibly another area for "stack memory of this function", so how to detect loads/stores to the stack)

cfallin commented 3 months ago

There are at least two ways I think:

Option 1: CLIF, with some mode, has a base pointer for every load and store. (This is the "tera-cage" idea above where memories are also within this overall area.) We could specify the base pointer as a special argument in the same way we do for vmctx today. The semantics are then that loads and stores logically access base[ptr] rather than *ptr; from a provenance point of view it's one big array.
Option 2: "externalized table of base pointers" described above. We represent this structure somehow in CLIF (special arg for base of table, size statically specified in CLIF? not sure) then have effectively "address space IDs" on loads/stores, so accesses are to table[aspace_id][ptr].

I kind of like the first more -- it's less intrusive to Cranelift (in a way that avoids complexity-footguns around e.g. alias analysis with separate address spaces) at the cost of a little more constraint on memory layout (but then we're already saying we're going to put everything in some arena).

cfallin commented 3 months ago

(For clarity on Option 1: the base pointer is implicit and affects all loads/stores; we wouldn't add an extra argument to load/store instructions.)

bytecodealliance / wasmtime

Idea: put VM data structures in arena with relative pointers for security and strict provenance #9060