Closed gvanrossum closed 1 year ago
Tentative decision reached offline: expand uops with conditional stack effects in the generator.
Currently the generator disallows conditional stack effects in uops (it fails an assert not effect.cond, effect
in get_stack_effect_info
). The plan is to just support this when generating the bytecode interpreter (possibly only in the final uop of a macro?).
But for the superblock generator we need to generate two separate uops, let's give them _0
and _1
suffixes, corresponding to the condition being 0 or 1 -- we can just replace oparg
with a constant (relying on the C compiler to elide the dead code). The metadata for macros will encode the condition so that the superblock generator can decide which of the two (similar) uops to put in the superblock. The rest of the optimization and machine code generation pipeline then doesn't need to worry about uops with conditional stack effects.
The failing assert
is only part of the issue -- that code is only used to generate the pushed/popped metadata. There's significant additional tech debt around the actual STACK_GROW
(etc.) macros generated at the end of a macro instruction -- this currently only supports unconditional effects. This is a TODO in wrap_super_or_macro
.
This is done. Conditional output stack effects in the last op composing a macro are now supported.
In #592 (and https://github.com/python/cpython/issues/104909) we're trying to split bytecode ops into micro ops (uops) with the constraint of one "data item" per uop, where a data item is either
oparg
or a cache item.This runs into some issues with conditional stack effects. For example, take the specializations for
LOAD_GLOBAL
(LOAD_GLOBAL_MODULE
andLOAD_GLOBAL_BUILTIN
) and forLOAD_ATTR
(e.g.LOAD_ATTR_INSTANCE_VALUE
). These opcodes must end in something that pushes either aNULL
and a result, or just a result, depending on the low bit ofoparg
. But the result being pushed also depends on a cached value. We also can't push theNULL
before the lastDEOPT_IF
orERROR_IF
call.Maybe we can add the conditionality to the macro itself, e.g.
Here
B
would be the uop that pushesNULL
.Or maybe the
null if (oparg & 1)
phrasing of the output stack effect in the uop definition is enough?For reference:
LOAD_GLOBAL
: https://github.com/python/cpython/compare/main...faster-cpython:cpython:split-load-global?expand=1LOAD_ATTR
: https://github.com/python/cpython/compare/main...faster-cpython:cpython:split-load-attr?expand=1#diff-729a985b0cb8b431cb291f1edb561bbbfea22e3f8c262451cd83328a0936a342R1787