This simple program led to the bug discovery, after fixing another bug...
print 42
Greedy coloring is blind to temp interference later in the program, e.g.,
let f x = ... in
...
in
f ...
Here f will be stored in some temp, and later during closure application, we
will call that temp. What if f was assigned RDI at original let? We could get
stuck although there were many other choices for f at its definition. Thus we keep
spilling f, keep assigning it to RDI, ...
~Similar case can be made for temps that live across call and are pre-colored to
callee-saved regs.~
~So the issue seems to be pre-coloring?~
~Nope. Even without call instruction, greedy coloring could result in
non-termination in which we keep spilling and making the same incorrect
greedy coloring.~
Ok, it's just a terribly incorrect algorithm. Cheers to all the time I spent
scratching my head on figuring it out and debugging.
Update 04/24
This issue deserves more documentation.
Fundamentally, why or when does spill happen?
Values (temps) must be saved somewhere (registers or memory), if they are
needed later. However, the # of registers is finite (memory is finite too,
but practically infinite for our purposes here), meaning if there are too
many values (temps) that are needed later, we must save some to memory
(spill).
In one instruction (ignore separation of read/write stages for now), we might
need to use multiple temps at the same time, i.e., we need to assign them
distinct registers. However, for a greedy allocator who's unaware of such
interference later in the program, these temps might have been assigned to
the same register earlier. (This would be a non-issue if our allocator
interface doesn't output a fixed mapping from temps to registers, but that
seems to require code emitting during register allocation, which I want to
avoid for now, in order to separate responsibilities)
Ok, now how do we guarantee termination, with the existing register allocation
interface? (i.e., output global temp-to-reg mapping or set of temps to spill)
Think backwards. If all we have is (2), that is easy as long as pre-coloring
doesn't assign distinct regs to temps used in the same instruction. If we can't
find available color in this case, it literally means the backend doesn't have
enough registers (think 1 register for a binary op).
Ok, but we have (1), how would spilling affect (1)? Imagine we are spilling T1
due to (1), and T2 (without loss of generality) is some other temp who is
simultatenously alive with T1 in some parts of the program. Since the greedy
algorithm assigns color linearly, we always spill the temp defined earlier. That
leaves us with 2 cases based on how T1 and T2 interferes (assume T*-end is the
last usage, it could be a new write too, but that doesn't really affect our
analysis -- we just replace the restore before T*-end with a spill after it).
We don't allow Rbp to be used for register allocation
Spill and Restore only requires the target temp and Rbp
We have reduced the interference due to (1). If we keep spilling, eventually all
(1) would go away, and all temps will be stored immediately to stack upon
definition, and loaded into regs upon usage. As long as we have enough register
for the max # of regs a single instruction requires, we will find a coloring.
If you feel suspicious right now, you are right. We never solved (2)! The
spilled temp is broken up into smaller live-ranges, but it's still the same
temp, meaning we must assign the same register to it, which may result in (2).
What do we do? Use a new Temp for each "restore", so all the original long
live-ranges of spilled temp is broken into smaller live-ranges of distinct new
temps. Now we have much more flexibility in register assignment for them. So we
really have this:
This simple program led to the bug discovery, after fixing another bug...
Greedy coloring is blind to temp interference later in the program, e.g.,
Here
f
will be stored in some temp, and later during closure application, we willcall
that temp. What iff
was assignedRDI
at originallet
? We could get stuck although there were many other choices forf
at its definition. Thus we keep spillingf
, keep assigning it toRDI
, ...~Similar case can be made for temps that live across call and are pre-colored to callee-saved regs.~
~So the issue seems to be pre-coloring?~
~Nope. Even without call instruction, greedy coloring could result in non-termination in which we keep spilling and making the same incorrect greedy coloring.~
Ok, it's just a terribly incorrect algorithm. Cheers to all the time I spent scratching my head on figuring it out and debugging.
Update 04/24
This issue deserves more documentation.
Fundamentally, why or when does spill happen?
Values (temps) must be saved somewhere (registers or memory), if they are needed later. However, the # of registers is finite (memory is finite too, but practically infinite for our purposes here), meaning if there are too many values (temps) that are needed later, we must save some to memory (spill).
In one instruction (ignore separation of read/write stages for now), we might need to use multiple temps at the same time, i.e., we need to assign them distinct registers. However, for a greedy allocator who's unaware of such interference later in the program, these temps might have been assigned to the same register earlier. (This would be a non-issue if our allocator interface doesn't output a fixed mapping from temps to registers, but that seems to require code emitting during register allocation, which I want to avoid for now, in order to separate responsibilities)
Ok, now how do we guarantee termination, with the existing register allocation interface? (i.e., output global temp-to-reg mapping or set of temps to spill)
Think backwards. If all we have is (2), that is easy as long as pre-coloring doesn't assign distinct regs to temps used in the same instruction. If we can't find available color in this case, it literally means the backend doesn't have enough registers (think 1 register for a binary op).
Ok, but we have (1), how would spilling affect (1)? Imagine we are spilling
T1
due to (1), andT2
(without loss of generality) is some other temp who is simultatenously alive withT1
in some parts of the program. Since the greedy algorithm assigns color linearly, we always spill the temp defined earlier. That leaves us with 2 cases based on how T1 and T2 interferes (assumeT*-end
is the last usage, it could be a new write too, but that doesn't really affect our analysis -- we just replace the restore beforeT*-end
with a spill after it).Note that since
We have reduced the interference due to (1). If we keep spilling, eventually all (1) would go away, and all temps will be stored immediately to stack upon definition, and loaded into regs upon usage. As long as we have enough register for the max # of regs a single instruction requires, we will find a coloring.
If you feel suspicious right now, you are right. We never solved (2)! The spilled temp is broken up into smaller live-ranges, but it's still the same temp, meaning we must assign the same register to it, which may result in (2).
What do we do? Use a new Temp for each "restore", so all the original long live-ranges of spilled temp is broken into smaller live-ranges of distinct new temps. Now we have much more flexibility in register assignment for them. So we really have this: