Executable superblocks must conform to the Executor interface, and should be able to jump into other superblocks.
These are quite distinct requirements from region selection and optimization.
Therefore, region selection and optimization should use a fixed per-interpreter buffer (maybe per-thread with nogil) from which the data is copied to create the executor.
The optimized region produced by the optimizer consists of a tree, zero or more of leaves of which may jump back to the root.
This format is hard execute efficiently as either all the branches need to compiled or none of them, and as the exits need to efficiently branch to code that does not exist yet.
We should convert the tree into a linear trace with exits to other linear traces. Jumps to the top can be compiled as direct jumps where possible, otherwise as indirect jumps to the primary executor.
Executable superblocks must conform to the
Executor
interface, and should be able to jump into other superblocks. These are quite distinct requirements from region selection and optimization.Therefore, region selection and optimization should use a fixed per-interpreter buffer (maybe per-thread with nogil) from which the data is copied to create the executor.
The optimized region produced by the optimizer consists of a tree, zero or more of leaves of which may jump back to the root. This format is hard execute efficiently as either all the branches need to compiled or none of them, and as the exits need to efficiently branch to code that does not exist yet.
We should convert the tree into a linear trace with exits to other linear traces. Jumps to the top can be compiled as direct jumps where possible, otherwise as indirect jumps to the primary executor.
See https://github.com/faster-cpython/ideas/issues/621 for a design of executors that allows back-patching without self modifying code.