Closed frank-emrich closed 3 months ago
You are supposed to remove the set of defined registers from the set of clobbered registers. Regalloc2 has the invariant that each register is either unchanged, marked as defined or marked as clobbered.
@frank-emrich I think this may "just" be an issue with the way the allocation constraints are specified to regalloc2. My mental model for an instruction's semantics and use- and def- points is roughly:
All within a single instruction, so no spills/reloads or moves can be inserted between steps 1-2 or 2-3.
Given that, it seems like it should be a "too many live registers" error if we have a def at step 1, and step 2 clobbers (blocks) all registers in a class; there is no register remaining that can make it past the clobbers and carry the live value downward.
A solution might be to remove the registers that are def'd from the clobber set. The effect is the same anyway -- it blocks the register from being used by other values live across this instruction.
(Yep, exactly what @bjorn3 said; we raced to answer!)
Thanks for your responses!
@cfallin
A solution might be to remove the registers that are def'd from the clobber set.
I understand how to do that if my early defs were fixed, but I'm not sure how to do it if they are non-fixed Reg
s, or if it's even possible in that case. After all, the clobber set is specified as a set of PReg
s. Does that simply mean that what I'm trying to do can only be achieved with fixed early defs, or am I missing something?
@frank-emrich I guess there are two options: use fixed early defs, or use unconstrained early defs and leave enough "holes" in the clobbers for them to fit. Otherwise the problem as specified to regalloc is impossible to solve because the clobber-set leaves no space for any live values in the requested register class.
@cfallin
[...] or use unconstrained early defs and leave enough "holes" in the clobbers for them to fit
Ah, I hadn't thought of that option. Thanks for clarifying!
In any case, closing this issue since regalloc is behaving correctly. I just didn't have a good enough mental model of how it works to realize that I was asking for the impossible.
As part of bytecodealliance/wasmtime#9078, I added a new x64
MInst
calledStackSwitchBasic
which clobbers all registers. In the process, I noticed the following: TheMInst
requires two temporary registers, which I tried to get by adding two additionalWritable<Gpr>
fields toStackSwitchBasic
and register them asreg_early_defs
inx64_get_operands
.However, this makes register allocation always fail for
StackSwitchBasic
withTooManyLiveRegs
.As a workaround, I'm simply picking two unused registers at emission time , which is safe just because the instruction is known to clobber all registers. That means that this issue isn't really urgent, the workaround works as expected.
This commit on top of Wasmtime main shows a patch that makes the problem easily reproducible: It's undoing my workaround and restoring what I originally had in mind for obtaining the temporary regs. On that branch, running the cranelift filetests fails on
stack_switch.clif
. (Sorry for not having a standalone repro without relying on Wasmtime, but I've not used regalloc2 as a standalone library before)As another workaround on top of the commit mentioned above, hard-coding the two temporary registers by setting them to fixed early defs also makes the problem go away. But I don't understand regalloc well enough to judge if that may reduce the quality of the generated code by imposing additional constraints.
What I'm seeing here may just be a variation of the problem already described in #145. Please let me know if it is, otherwise I'll cross reference this issue in my workaround in
emit.rs
.