Closed Fidget-Spinner closed 3 months ago
These seems rather vague. Do you have an algorithm? How would the work be split between the code generator and optimizer?
Why 5 arguments? The optimum number will vary from platform to platform.
How does this differ from top of stack caching as proposed in https://github.com/python/cpython/issues/115802?
It's the same as stack caching for now. closing this
This proposal is less of "register allocation" and more of "register passing". However, it should achieve the same effect. Inspired partly by Brandt's proposal.
The optimizations
There are two optimizations proposed here:
How this will look like
The template will be changed to this:
The tail/call continuation will thus look like this:
How this will be generated
At build time, anything that uses the top 5 stack operands will not push/pop from the CPython operand stack. Instead we rewrite the stack input/output effects in the case generator to access directly from those args.
Overall, these should have a significant speedup. Register allocation from the paper is IIRC the second most worthwhile optimization after zero-length jumps. Not just that, but we eliminate a lot of stack traffic except for some cases.
How to handle deopt, side exit, and error
Thankfully not complex. For a uop, push all their inputs
(reg0 - reg4)
to the stack before exiting to the interpreter.Concerns