Open cfallin opened 3 months ago
How would this be reflected in CLIF? All loads/stores would have to be derived from a small set of "base pointers" such as the base of each linear memory in a module and the arena holding vmctx/tables/etc. For example Pulley might have a load-from-memory-zero instruction or load-from-arena instruction, but how would it know which to choose from a CLIF load
?
(and possibly another area for "stack memory of this function", so how to detect loads/stores to the stack)
There are at least two ways I think:
base[ptr]
rather than *ptr
; from a provenance point of view it's one big array.table[aspace_id][ptr]
.I kind of like the first more -- it's less intrusive to Cranelift (in a way that avoids complexity-footguns around e.g. alias analysis with separate address spaces) at the cost of a little more constraint on memory layout (but then we're already saying we're going to put everything in some arena).
(For clarity on Option 1: the base pointer is implicit and affects all loads/stores; we wouldn't add an extra argument to load/store instructions.)
As part of the discussion on #9015 / #9026, we discussed handling of VM data structures -- the vmctx struct, tables, function references, and the like -- that are touched both by runtime code (in Rust) and by generated code compiled from the Wasm. There are issues related to strict pointer provenance because pointers to these data structures are exposed to the generated code, and/or the Pulley interpreter, without strict provenance (either through the Pulley bytecode, or through the machine code we invoke that is entirely outside of the domain of Rust's semantics).
It occurs to me that one way to solve this would be to make all VM data structures use relative pointers -- e.g.,
u32
offsets -- in an arena (per store? per engine?) whose base pointer is a parameter both to the generated code and to the Pulley interpreter. We then trivially have strict provenance because there is only one pointer -- and whatever we need to do to preserve provenance (keep it as a pointer in the Pulley interpreter loop; and "expose" it as we pass it to generated code) is localized and manageable.If provenance were the only benefit, that may not be so interesting; but there are a few others as well:
Security: maintaining more discipline around raw pointers, and carefully dereferencing offsets into an arena instead (for which we can reserve guard regions just as we do heaps today), is a layer of mitigation/defense-in-depth against engine bugs. For example, if a miscompile or a bug in the generated CLIF caused a pointer-type confusion in VM data structures today, one could plausibly find a memory-read gadget or control-flow escape more easily. We will still have some raw pointers -- the actual code pointer that we invoke in funcrefs, or the memory base address in an imported or owned memory descriptor -- but fewer of them is less exposure.
This angle is not new: WebKit has the Gigacage, I believe V8 has something similar, and I had suggested we build (and now we are building) our Wasm GC implementation in the same way with relative-offset-pointers; so it's a proven mitigation and the overhead seems to be minimal.
Note also that relative-pointer loads/stores can be implemented with fully safe code in the Pulley interpreter. We almost certainly still need unsafe code still for the Wasm heap dereferences and such (though, then again, maybe there's a way around that by either externalizing a table of alternative raw pointer bases, or putting memories in a large gigacage -- teracage? exacage? -- as well).
Performance: 32-bit relative-offset pointers are half the size of 64-bit raw machine pointers on 64-bit platforms; this is the basis for the performance gain seen with compressed oops (object pointers) in OpenJDK, and also with wasm32 vs. native 64-bit code in some pointer-heavy benchmarks. It's plausible that a 2x shrink on the size of large function tables (for example) might result in slightly better cache residency and performance. Then again carrying the base pointer and adding it has a slight cost (single-digit percentage by analogy to studies on Wasm heap strategies); so maybe overall neutral.
There may be some other interesting side-effects: for example, if we fully relativize the core Wasm VM data structures, and externalize the "raw pointers" to a table as noted above, it would mean we could snapshot the entire VM state (at the VM level, not the Wasm level).
So it seems we can get (i) fully strict provenance, (ii) better safety, (iii) other interesting new abstractions like whole-engine snapshotting, if we pay this cost. Something to consider later if any of these needs becomes interesting?