Greedy register allocator might not terminate

This simple program led to the bug discovery, after fixing another bug...

print 42

Greedy coloring is blind to temp interference later in the program, e.g.,

let f x = ... in
...
in
f ...

Here f will be stored in some temp, and later during closure application, we will call that temp. What if f was assigned RDI at original let? We could get stuck although there were many other choices for f at its definition. Thus we keep spilling f, keep assigning it to RDI, ...

~Similar case can be made for temps that live across call and are pre-colored to callee-saved regs.~

~So the issue seems to be pre-coloring?~

~Nope. Even without call instruction, greedy coloring could result in non-termination in which we keep spilling and making the same incorrect greedy coloring.~

Ok, it's just a terribly incorrect algorithm. Cheers to all the time I spent scratching my head on figuring it out and debugging.

Update 04/24

This issue deserves more documentation.

Fundamentally, why or when does spill happen?

Values (temps) must be saved somewhere (registers or memory), if they are needed later. However, the # of registers is finite (memory is finite too, but practically infinite for our purposes here), meaning if there are too many values (temps) that are needed later, we must save some to memory (spill).
In one instruction (ignore separation of read/write stages for now), we might need to use multiple temps at the same time, i.e., we need to assign them distinct registers. However, for a greedy allocator who's unaware of such interference later in the program, these temps might have been assigned to the same register earlier. (This would be a non-issue if our allocator interface doesn't output a fixed mapping from temps to registers, but that seems to require code emitting during register allocation, which I want to avoid for now, in order to separate responsibilities)

Ok, now how do we guarantee termination, with the existing register allocation interface? (i.e., output global temp-to-reg mapping or set of temps to spill)

Think backwards. If all we have is (2), that is easy as long as pre-coloring doesn't assign distinct regs to temps used in the same instruction. If we can't find available color in this case, it literally means the backend doesn't have enough registers (think 1 register for a binary op).

Ok, but we have (1), how would spilling affect (1)? Imagine we are spilling T1 due to (1), and T2 (without loss of generality) is some other temp who is simultatenously alive with T1 in some parts of the program. Since the greedy algorithm assigns color linearly, we always spill the temp defined earlier. That leaves us with 2 cases based on how T1 and T2 interferes (assume T*-end is the last usage, it could be a new write too, but that doesn't really affect our analysis -- we just replace the restore before T*-end with a spill after it).

T1-def
...
...        T2-def
...        ...
T1-end     ...
...        ...
...        T2-end
...        ...

--->

T1-def
T1-spill
...
...        T2-def
...        ...
T1-restore ...
T1-end     ...
...        ...
...        T2-end
...        ...

===

T1-def
T1-use
...
...        T2-def
...        ...
T1-def     ...
T1-end     ...
...        ...
...        T2-end
...        ...

Note that since

We don't allow Rbp to be used for register allocation
Spill and Restore only requires the target temp and Rbp

We have reduced the interference due to (1). If we keep spilling, eventually all (1) would go away, and all temps will be stored immediately to stack upon definition, and loaded into regs upon usage. As long as we have enough register for the max # of regs a single instruction requires, we will find a coloring.

If you feel suspicious right now, you are right. We never solved (2)! The spilled temp is broken up into smaller live-ranges, but it's still the same temp, meaning we must assign the same register to it, which may result in (2).

What do we do? Use a new Temp for each "restore", so all the original long live-ranges of spilled temp is broken into smaller live-ranges of distinct new temps. Now we have much more flexibility in register assignment for them. So we really have this:

T1-def
...
...        T2-def
...        ...
T1-end     ...
...        ...
...        T2-end
...        ...

--->

T1-def
T1-spill
...
...        T2-def
...        ...
T3-restore ...
T3-end     ...
...        ...
...        T2-end
...        ...

===

T1-def
T1-use
...
...        T2-def
...        ...
T3-def     ...
T3-end     ...
...        ...
...        T2-end
...        ...

StrongerXi / soc

Greedy register allocator might not terminate #29

Update 04/24