Bug in register allocation when the last use of a register is in an OOL code section

fjeremic commented 4 years ago

Consider the following instruction level CFG:

   Block 0
+-----------+
|           |
|           |
|           |
|  Use X    |  [Location 5]
|           |
|           |
|           |
+-----+-----+
      |
      |
      |
      v
   Block 1              OOL 1
+-----------+ +----> +----------+
|           |        |          |
|           |        |          |
|           |        |          |
|           |        |          |
|           |        |  Use X   |  [Location 4]
|           |        |  Spill X |  [Location 3]
|           |        |          |
+-----+-----+ <----+ +----------+
      |
      |
      |
      v
   Block 2              OOL 2
+-----------+ +----> +----------+
|           |        |          |
|           |        |          |
|           |        |  Fill X  |  [Location 2]
|           |        |          |
|           |        |          |
|           |        |  Use X   |  [Location 1]
|           |        |          |
+-----+-----+ <----+ +----------+
      |
      |
      |
      v

In a backwards register allocation pass of this CFG (explained in [1]) we would proceed by register allocating Block 2 first, followed by OOL 2. When we encounter [Location 1], suppose this is the first occurrence of virtual register X. In such a case the future use count == total use count of this register during RA.

Suppose we have a free register at that point, so we associated X with some real register. Now suppose due to high register pressure, inside of OOL 2 at [Location 2] we choose to evict X from the set of live registers. We would then generate a fill of register X which would be a load instruction. At this point in the register state, register X should have a backing storage object and it should be spilled.

Now we reach the beginning of OOL 2 which would be the OOL label. Prior to register allocating OOL 2 we would have collected the set of live/spilled registers at the branch point to the OOL and generated register dependencies for the OOL 2 entry label, which will ensure the register state at both the fallthrough and the branch taken path to the OOL would match. An example of this on Power can be found in [2]. However because register X was never seen during the allocation of Block 2 (because it's last occurrence is in the OOL 2 path) it is never added to this dependency list.

Now, register allocation proceeds to allocate Block 1 and we again generate the register dependencies for the entry label of OOL 1. Because we've now lost track of register X it is never added to the register spill list which we keep track of during RA, and as such no register dependency would be created for register X at the entry label of OOL 1 either.

We then proceed to allocate OOL 1. We encounter a use of register X and we would perform the typical "is this register spilled check", which would mean checking if register X is associated with a real register (it would not be), and then checking if the future use count equals the total use count (it does not), hence the register must have been spilled. So at [Location 3] we would generate a spill, which would be a store instruction. Now we can associate register X with some real register and use it within [Location 4].

Now suppose we continue allocation until the entry label of OOL 1. At this point register X is not in the dependency list because we have lost track of it. If we are unlucky enough, the real register we picked at [Location 4] to hold register X will not be touched during the satisfaction of the dependency list.

We then continue allocation and we eventually find the first (in execution order) use of register X in the mainline code. The virtual register is already assigned to a real register so we can use it without any fill/spill at [Location 1].

Finally to describe the problem, at runtime if we were to take the following path:

Block 1 -> Block 2 -> OOL 2

We would have a path on which we load from a stack slot to which we never stored (because we never executed the store at [Location 3]), and the value of register X is corrupted at [Location 1].

[1] https://github.com/eclipse/omr/blob/892a0fef24c7bc781cda8aab530b4cd18cea6814/compiler/codegen/OutOfLineCodeSection.hpp#L36-L106 [2] https://github.com/eclipse/omr/blob/892a0fef24c7bc781cda8aab530b4cd18cea6814/compiler/p/codegen/PPCOutOfLineCodeSection.cpp#L102-L105

fjeremic commented 4 years ago

I did not make up this hypothetical scenario. I thought about it after seeing #4866 and it reminded me of this particularly nasty bug we fixed on the Z register allocator. This bug was exposed by a change in OpenJ9 to start using OOL code sections to handle zero length array allocations.

In OpenJ9 the tree representing a primitive array allocation has a constant node which denotes the type of the primitive allocation, for example:

newarray
  iload <length>
  iconst 2 ; denotes char[] allocation

This iconst value can be commoned with other nodes, including other newarray nodes as well. The CFG example described in the previous comment can result from a tree sequence such as:

n1n  istore <someVariable>
n2n    iconst 2
...
n3n  newarray
n4n    iload<length1>
n2n    ==> iconst 2
n5n  newarray
n6n    iload <length2>
n2n    ==> iconst 2

The value of the iconst 2 is not needed in the mainline code because it's just a metadata varaible which tells us the type of the array. So no need to evaluate it into a register. However if we need to call a VM helper (which we do in the OOL code section) we need to store the type into a parameter register to dispatch the C call, so we would make use of that iconst 2 value only in the OOL section.

Attempting to evaluate all the children of a newarray node before branching the OOL code section will not help in this case, because n2n would have been associated with a virtual register (register X), so this doesn't solve the problem.

fjeremic commented 4 years ago

This issue was fixed on Z by introducing another list which we called firstTimeLiveOOLRegisterList. During register allocation, if the following is true, we add the virtual register to this list:

Register total use count == future use count (register is seen for the first time)
We are allocating the OOL code section

Then inside assignRegisters function for the OOL code section we process this list. We loop through all registers in the list and check if they have a backing storage. If they do, that must mean the register was first seen in an OOL section and was subsequently spilled somewhere. We simply add such virtual registers to the spilled register list, such that when we eventually come to the branch from Block 1 to OOL 1 we would correctly create a register dependency which would ensure register X stays spilled at the entry of OOL 1 (which currently did not happen today on non-Z platforms).

This fixes the problem because it is now impossible to encounter a path on which the value of register X is corrupted. See [3] for how we process this list.

FYI @0xdaryl @andrewcraik (x86) @gita-omr @zl-wang (Power) @knn-k (AArch64) @janvrany (RISC-V)

[3] https://github.com/eclipse/omr/blob/892a0fef24c7bc781cda8aab530b4cd18cea6814/compiler/z/codegen/S390OutOfLineCodeSection.cpp#L197-L206

fjeremic commented 4 years ago

Needless to say we should really consider investing into commoning up the register allocation code across platforms, since most of it can be shared.

eclipse / omr

Bug in register allocation when the last use of a register is in an OOL code section #4961