Although arithmetic shift suffices for implementing multiplication, I implemented signed division because we'll need it eventually, and shifting is more of an optimization than actual need.
Earlier design paid off. Pre-coloring easily supports special purpose register (rax and rdx) for signed division in X86. Although the design can be improved further by somehow grouping together special purpose regs, so we know where to modify when adding a new one, ideally with compiler hints (I almost missed some places such as fixed temps in spilling).
Takeaways:
factorial
:).