Closed abrown closed 3 years ago
cc: @julian-seward1, @bnjbvr, @cfallin
I agree that there's an issue with the current design, and indeed the RegMem::Mem
variant could have its source location. (No need for Option<SourceLoc>
in fact, since SourceLoc
already has a default value meaning invalid that we could reuse to mean "undefined".) This would become a large issue when input_to_reg_mem
can actually embed a load in an address-mode operand.
Unfortunately, this also means that the SyntheticAmode would then become even much bigger, going against the general plan of making Vcode's inst data fit into 32 bytes or fewer. It would be nice to find a way to properly pack here without inflating the RegMem too much.
To wit: these source locations are also only useful when not using implicit bounds checking (via signal handling and the 4GB memory page trick). When using implicit bounds checking, this data information has no use at all. We could imagine adding a trait to lowering, to decide whether we're using explicit bounds checks or not, and that trait would select a different type for RegMem
and/or SyntheticAmode
, but that lead to a lot of binary code duplication in the Cranelift-embedding binary, I expect...
So at a high level, I think this issue implies several sub-problems to solve:
MachInst
s such that it is not possible to accidentally create an inst that should have a SourceLoc
attached without that SourceLoc
. I think that we can probably achieve this by auditing the helpers that we create.SourceLoc
s efficiently, that is, ideally not inflate the size of every RegMem
if we can help it.It occurred to me that we could solve the last two issues simultaneously by keeping source-location / trap info in a side-table, indexed by instruction (i.e., a sparse instruction-index to info map), and emitting this metadata to the sink as we iterate over the insts, outside of the machine backend's particular MachInstEmit
method. We already have a notion of metadata outside the instruction, in the form of the "is this a safepoint" bit; perhaps this is another.
(In a sense, both of these bits are orthogonal to, and outside the scope of, the machine instruction itself; the MachInst
is some instruction/operation that updates machine state, and the safepoint bit and trap info are metadata carried by the emission pipeline out to side-tables on the machine code.)
So perhaps we have a notion of InsnMetadata
records; each instruction has zero or more such records; they are held by the VCode container in a sparse manner, sorted by instruction index (i.e., kept in the same order as the insts); and the VCode container promises to keep the order in sync, if e.g. it reorders or removes insts (e.g., move elision after RA).
Then we have some trait that the InsnMetadata
implements that "emits" the metadata to the MachBuffer
sink; these emit-calls are interleaved with instruction emit calls as appropriate.
Thoughts? I can sketch this in more detail if there's general agreement that we should have a framework something like this.
When we call Inst::foo(...)
, how would Inst
have a way to stuff the metadata into into VCode
? I don't see how they reach one another.
(As an aside about Inst
size, I have wondered why we are not using #[repr(u8)]
more liberally to fit stuff into 32 bytes: in the longest case I can think of, we would have an Inst
with the following conceptual fields, {bool, opcode, source loc, dst reg, src from shifted amode}
. If we applied #[repr(u8)]
, we could get this case down to inst tag/1 + bool/1 + opcode/2 + source loc/4 + dst reg/4 + src tag/1 + SytheticAMode tag/1 + Amode tag/1 + simm32/4, + base reg/4 + index reg/4 + shift/1
which equals 28 bytes. I'm not saying we shouldn't store stuff in a side table, that may still make sense, but why not pack things tighter? Or perhaps Rust won't pack things this way?)
Re: #[repr(u8)]
: my understanding from @bnjbvr's explanation is that there are alignment/padding restrictions that come along with nested enums -- so if every field were in one struct
, we could indeed pack more tightly, but we haven't gone there. (It could be an option if we have a way to keep the ergonomics somehow; maybe a "packed" and "unpacked" form; needs more thought.)
When we call Inst::foo(...), how would Inst have a way to stuff the metadata into into VCode? I don't see how they reach one another.
Indeed, the idea is that the creator of the Inst
would attach the metadata as it passes the instruction to the LowerCtx
. Right now we have (slightly hackishly) ctx.emit()
and ctx.emit_safepoint()
; perhaps instead we could have ctx.emit_with_metadata()
, and then have various constructors for the metadata info/flags (InsnMetadata::builder().with_mem_trap_loc(...).build()
maybe, and .is_safepoint()
as well to set the safepoint bit?).
Anyway, just a thought; perhaps it's too much infrastructure for just a few things; but it at least enables sparsity and gives the ability to only optionally include the data (with zero cost if not), which is good :-)
Makes sense to me. @bnjbvr, @julian-seward1?
Re: #[repr(u8)]: my understanding from @bnjbvr's explanation is that there are alignment/padding restrictions that come along with nested enums -- so if every field were in one struct, we could indeed pack more tightly, but we haven't gone there. (It could be an option if we have a way to keep the ergonomics somehow; maybe a "packed" and "unpacked" form; needs more thought.)
Yes, exactly this: each different enum adds a new word for the discriminant (not sure if it applies under certain conditions like "when the enum may be heap-allocated" or not; this matches what I've observed with the Inst enum, though). Even if we flattened all the enums into a single one, and then used a newtype as suggested by @fitzgen, we'd not have enough space; the ideal packing would need the RegMem SIB shift offset (3 bits) to be on the same byte as the overall emulated discriminant, if I remember correctly my back-of-the-envelope computation.
Indeed, the idea is that the creator of the Inst would attach the metadata as it passes the instruction to the LowerCtx. Right now we have (slightly hackishly) ctx.emit() and ctx.emit_safepoint(); perhaps instead we could have ctx.emit_with_metadata(), and then have various constructors for the metadata info/flags (InsnMetadata::builder().with_mem_trap_loc(...).build() maybe, and .is_safepoint() as well to set the safepoint bit?).
The Metadata struct sounds good to me! If it has only two fields (is_safepoint + trap's sourceloc), and in no cases both are needed at the same time, the Builder pattern may be overhead, and it may increase Rustc compile times for the already-quite-large lower
functions (but that's still another issue, that can be handled separately).
Many instruction formats in the new x64 backend accept a
RegMem
parameter, implying that they may load from memory. This may cause a trap if the memory address is out-of-bounds and, in order to return the correct trap code, we should be registering the source location of the instruction, e.g.:Currently, some formats do accept a
RegMem
parameter but do not correctly register the possible trap location. This passes the spec tests because these formats are not used in theirRegMem::Mem
variant, but for correctness and completeness we should register the trap locations any time it is possible that an instruction may load from memory.The current convention for doing so appends a
srcloc: Option<SourceLoc>
field to the format and expects the programmer to correctly choose betweenNone
andSome(ctx.srcloc(insn))
when instatiating a new instruction format. I propose we change this: instead, attach theOption<SourceLoc>
, or perhaps even a plainSourceLoc
, toSyntheticAmode
. This approach has advantages:lower.rs
can automatically attach aSourceLoc
to anySyntheticAmode
s createdInst
formats no longer need an extra field filled withNone
in many casesIf we agree that the above approach is a good one, this issue should be closed not only
RegMem::Mem
variants