bytecodealliance / regalloc2

A new register allocator
Apache License 2.0
217 stars 38 forks source link

Support instructions that clobber all registers and have non-fixed early defs #182

Closed frank-emrich closed 3 months ago

frank-emrich commented 3 months ago

As part of bytecodealliance/wasmtime#9078, I added a new x64 MInst called StackSwitchBasic which clobbers all registers. In the process, I noticed the following: The MInst requires two temporary registers, which I tried to get by adding two additional Writable<Gpr> fields to StackSwitchBasic and register them as reg_early_defs in x64_get_operands.

However, this makes register allocation always fail for StackSwitchBasic with TooManyLiveRegs.

As a workaround, I'm simply picking two unused registers at emission time , which is safe just because the instruction is known to clobber all registers. That means that this issue isn't really urgent, the workaround works as expected.

This commit on top of Wasmtime main shows a patch that makes the problem easily reproducible: It's undoing my workaround and restoring what I originally had in mind for obtaining the temporary regs. On that branch, running the cranelift filetests fails on stack_switch.clif. (Sorry for not having a standalone repro without relying on Wasmtime, but I've not used regalloc2 as a standalone library before)

As another workaround on top of the commit mentioned above, hard-coding the two temporary registers by setting them to fixed early defs also makes the problem go away. But I don't understand regalloc well enough to judge if that may reduce the quality of the generated code by imposing additional constraints.

What I'm seeing here may just be a variation of the problem already described in #145. Please let me know if it is, otherwise I'll cross reference this issue in my workaround in emit.rs.

bjorn3 commented 3 months ago

You are supposed to remove the set of defined registers from the set of clobbered registers. Regalloc2 has the invariant that each register is either unchanged, marked as defined or marked as clobbered.

cfallin commented 3 months ago

@frank-emrich I think this may "just" be an issue with the way the allocation constraints are specified to regalloc2. My mental model for an instruction's semantics and use- and def- points is roughly:

  1. Early uses are read and early defs are written (in parallel, simultaneously).
  2. Clobbers occur.
  3. Late uses are read and late defs are written (in parallel, simultaneously).

All within a single instruction, so no spills/reloads or moves can be inserted between steps 1-2 or 2-3.

Given that, it seems like it should be a "too many live registers" error if we have a def at step 1, and step 2 clobbers (blocks) all registers in a class; there is no register remaining that can make it past the clobbers and carry the live value downward.

A solution might be to remove the registers that are def'd from the clobber set. The effect is the same anyway -- it blocks the register from being used by other values live across this instruction.

cfallin commented 3 months ago

(Yep, exactly what @bjorn3 said; we raced to answer!)

frank-emrich commented 3 months ago

Thanks for your responses!

@cfallin

A solution might be to remove the registers that are def'd from the clobber set.

I understand how to do that if my early defs were fixed, but I'm not sure how to do it if they are non-fixed Regs, or if it's even possible in that case. After all, the clobber set is specified as a set of PRegs. Does that simply mean that what I'm trying to do can only be achieved with fixed early defs, or am I missing something?

cfallin commented 3 months ago

@frank-emrich I guess there are two options: use fixed early defs, or use unconstrained early defs and leave enough "holes" in the clobbers for them to fit. Otherwise the problem as specified to regalloc is impossible to solve because the clobber-set leaves no space for any live values in the requested register class.

frank-emrich commented 3 months ago

@cfallin

[...] or use unconstrained early defs and leave enough "holes" in the clobbers for them to fit

Ah, I hadn't thought of that option. Thanks for clarifying!

In any case, closing this issue since regalloc is behaving correctly. I just didn't have a good enough mental model of how it works to realize that I was asking for the impossible.