Open markshannon opened 1 year ago
We need a way to create the above superblock.
Creation of the superblock involves abstract interpretation over the (hopefully specialized) bytecodes. During interpretation we should maintain five values:
trace
: The sequence of micro-ops generated so farcode
: The current code objectoffset
: The current offset into the code object of the next instructioncall_stack
: A stack of code, offset
pairs (for tracking calls, sends, etc.)on_target
: The estimated likelihood that execution will reach this point in the superblock.The initial values are:
trace
: Emptycode
and offset
: As provided by the optimizer API call.call_stack
: Emptyon_target
: 100%The algorithm is something like:
while (on_target > on_target_threshold) {
inst = read_instruction(code, offset)
switch(inst.opcode) {
case NOP:
/* no micro-ops to add */
/* code object is unchanged */
offset += 1
on_target = on_target * 1.0;
break;
/* etc, etc... */
}
}
This would incredibly tedious and error-prone to write by hand. So we need to extend the interpreter generator.
With sufficient metadata we can generate the above loop. We need to know which instructions are jumps, branches, contain guards, etc.
We will need to transform some micro-ops.
Non-local jumps like calls, yields, etc should be explicit in the bytecode. We may need to understand things like RESUME_FRAME
, to handle these. We need to recognize frame pops and pushes, not only so that we can emit the right micro-op, but so that we push to the call_stack
.
Local jumps (jumps and branches) need explicit handling. Jumps become no-ops (as all instructions need to update offset
), and branches become conditional exits.
Guards remain as guards (the behavior differs, but that is dealt with by the tier2 interpreter/compiler).
An incremental way of doing this using the code generator might be to add a default case to the switch that bails out of the loop when an unsupported opcode is encountered, and then trying to add stuff to bytecodes.c and/or to the generator to allow more and more opcodes to be supported.
Some metadata can be recovered by scanning the C code for an instruction (@iritkatriel is starting to do this in https://github.com/python/cpython/pull/105482 already); presumably there's also metadata that will require us to mark certain instructions explicitly.
A superblock will be a linear sequence of "micro ops", that may have multiple side exits, but will only have one entry point.
The start of the superblock will be determined by "hotspots" in the code. The tricky bit is creating the rest of the superblock. For non-branching code, extending the superblock is easy enough. For branches we either need the base interpreter to record which way the branch goes, or to make a estimate based on static analysis and the value being tested, if it is available. For non-local jumps, like calls and return, we need to rely on the information recorded by the specializing adaptive interpreter.
We should end the superblock when either the estimated likelihood of execution staying in the superblock drops below a threshold, say 40%, or when we cannot estimate that likelihood.
An example:
The bytecode for the above snippet is:
and the bytecode for
typing.cast
is:Starting at the
LOAD_GLOBAL 0 (typing)
the resulting superblock might look something like:All the
SAVE_IP
instructions exist to make sure that theframe->prev_instr
field gets updated correctly. Don't worry, optimization should remove almost all of them.