bytecodealliance / wasmtime

A fast and secure runtime for WebAssembly
https://wasmtime.dev/
Apache License 2.0
15.34k stars 1.3k forks source link

Register source locations for instructions that may load in x64 backend #2290

Closed abrown closed 3 years ago

abrown commented 4 years ago

Many instruction formats in the new x64 backend accept a RegMem parameter, implying that they may load from memory. This may cause a trap if the memory address is out-of-bounds and, in order to return the correct trap code, we should be registering the source location of the instruction, e.g.:

if let Some(srcloc) = *srcloc {
    sink.add_trap(srcloc, TrapCode::HeapOutOfBounds);
}

Currently, some formats do accept a RegMem parameter but do not correctly register the possible trap location. This passes the spec tests because these formats are not used in their RegMem::Mem variant, but for correctness and completeness we should register the trap locations any time it is possible that an instruction may load from memory.

The current convention for doing so appends a srcloc: Option<SourceLoc> field to the format and expects the programmer to correctly choose between None and Some(ctx.srcloc(insn)) when instatiating a new instruction format. I propose we change this: instead, attach the Option<SourceLoc>, or perhaps even a plain SourceLoc, to SyntheticAmode. This approach has advantages:

If we agree that the above approach is a good one, this issue should be closed not only

  1. when that refactoring is complete but
  2. also when any instruction formats that can load/store properly registers traps for their RegMem::Mem variants
abrown commented 4 years ago

cc: @julian-seward1, @bnjbvr, @cfallin

bnjbvr commented 4 years ago

I agree that there's an issue with the current design, and indeed the RegMem::Mem variant could have its source location. (No need for Option<SourceLoc> in fact, since SourceLoc already has a default value meaning invalid that we could reuse to mean "undefined".) This would become a large issue when input_to_reg_mem can actually embed a load in an address-mode operand.

Unfortunately, this also means that the SyntheticAmode would then become even much bigger, going against the general plan of making Vcode's inst data fit into 32 bytes or fewer. It would be nice to find a way to properly pack here without inflating the RegMem too much.

To wit: these source locations are also only useful when not using implicit bounds checking (via signal handling and the 4GB memory page trick). When using implicit bounds checking, this data information has no use at all. We could imagine adding a trait to lowering, to decide whether we're using explicit bounds checks or not, and that trait would select a different type for RegMem and/or SyntheticAmode, but that lead to a lot of binary code duplication in the Cranelift-embedding binary, I expect...

cfallin commented 4 years ago

So at a high level, I think this issue implies several sub-problems to solve:

It occurred to me that we could solve the last two issues simultaneously by keeping source-location / trap info in a side-table, indexed by instruction (i.e., a sparse instruction-index to info map), and emitting this metadata to the sink as we iterate over the insts, outside of the machine backend's particular MachInstEmit method. We already have a notion of metadata outside the instruction, in the form of the "is this a safepoint" bit; perhaps this is another.

(In a sense, both of these bits are orthogonal to, and outside the scope of, the machine instruction itself; the MachInst is some instruction/operation that updates machine state, and the safepoint bit and trap info are metadata carried by the emission pipeline out to side-tables on the machine code.)

So perhaps we have a notion of InsnMetadata records; each instruction has zero or more such records; they are held by the VCode container in a sparse manner, sorted by instruction index (i.e., kept in the same order as the insts); and the VCode container promises to keep the order in sync, if e.g. it reorders or removes insts (e.g., move elision after RA).

Then we have some trait that the InsnMetadata implements that "emits" the metadata to the MachBuffer sink; these emit-calls are interleaved with instruction emit calls as appropriate.

Thoughts? I can sketch this in more detail if there's general agreement that we should have a framework something like this.

abrown commented 4 years ago

When we call Inst::foo(...), how would Inst have a way to stuff the metadata into into VCode? I don't see how they reach one another.

(As an aside about Inst size, I have wondered why we are not using #[repr(u8)] more liberally to fit stuff into 32 bytes: in the longest case I can think of, we would have an Inst with the following conceptual fields, {bool, opcode, source loc, dst reg, src from shifted amode}. If we applied #[repr(u8)], we could get this case down to inst tag/1 + bool/1 + opcode/2 + source loc/4 + dst reg/4 + src tag/1 + SytheticAMode tag/1 + Amode tag/1 + simm32/4, + base reg/4 + index reg/4 + shift/1 which equals 28 bytes. I'm not saying we shouldn't store stuff in a side table, that may still make sense, but why not pack things tighter? Or perhaps Rust won't pack things this way?)

cfallin commented 4 years ago

Re: #[repr(u8)]: my understanding from @bnjbvr's explanation is that there are alignment/padding restrictions that come along with nested enums -- so if every field were in one struct, we could indeed pack more tightly, but we haven't gone there. (It could be an option if we have a way to keep the ergonomics somehow; maybe a "packed" and "unpacked" form; needs more thought.)

When we call Inst::foo(...), how would Inst have a way to stuff the metadata into into VCode? I don't see how they reach one another.

Indeed, the idea is that the creator of the Inst would attach the metadata as it passes the instruction to the LowerCtx. Right now we have (slightly hackishly) ctx.emit() and ctx.emit_safepoint(); perhaps instead we could have ctx.emit_with_metadata(), and then have various constructors for the metadata info/flags (InsnMetadata::builder().with_mem_trap_loc(...).build() maybe, and .is_safepoint() as well to set the safepoint bit?).

Anyway, just a thought; perhaps it's too much infrastructure for just a few things; but it at least enables sparsity and gives the ability to only optionally include the data (with zero cost if not), which is good :-)

abrown commented 4 years ago

Makes sense to me. @bnjbvr, @julian-seward1?

bnjbvr commented 4 years ago

Re: #[repr(u8)]: my understanding from @bnjbvr's explanation is that there are alignment/padding restrictions that come along with nested enums -- so if every field were in one struct, we could indeed pack more tightly, but we haven't gone there. (It could be an option if we have a way to keep the ergonomics somehow; maybe a "packed" and "unpacked" form; needs more thought.)

Yes, exactly this: each different enum adds a new word for the discriminant (not sure if it applies under certain conditions like "when the enum may be heap-allocated" or not; this matches what I've observed with the Inst enum, though). Even if we flattened all the enums into a single one, and then used a newtype as suggested by @fitzgen, we'd not have enough space; the ideal packing would need the RegMem SIB shift offset (3 bits) to be on the same byte as the overall emulated discriminant, if I remember correctly my back-of-the-envelope computation.

Indeed, the idea is that the creator of the Inst would attach the metadata as it passes the instruction to the LowerCtx. Right now we have (slightly hackishly) ctx.emit() and ctx.emit_safepoint(); perhaps instead we could have ctx.emit_with_metadata(), and then have various constructors for the metadata info/flags (InsnMetadata::builder().with_mem_trap_loc(...).build() maybe, and .is_safepoint() as well to set the safepoint bit?).

The Metadata struct sounds good to me! If it has only two fields (is_safepoint + trap's sourceloc), and in no cases both are needed at the same time, the Builder pattern may be overhead, and it may increase Rustc compile times for the already-quite-large lower functions (but that's still another issue, that can be handled separately).