Closed markshannon closed 1 year ago
Just a note from our previous discussions for anyone else reading: this is only legal if both instructions are on the same line. When I tried a similar idea out before, this caused almost all of the STORE_FAST__LOAD_FAST
superinstructions to disappear (since they basically always span more than one line).
Instructions don't really have lines, effects do. Since LOAD_FAST
has no observable side effects, it can be moved from line to line.
Consider
a = 1
return b
which compiles to:
1 LOAD_CONST 1 (1)
STORE_FAST 0 (a)
2 LOAD_FAST 1 (b)
RETURN_VALUE
Which implies that we cannot merge the STORE_FAST
and LOAD_FAST
. But as LOAD_FAST
has no observable side effects, this is a legal translation:
1 LOAD_CONST 1 (1)
STORE_FAST 0 (a)
LOAD_FAST 1 (b)
2 RETURN_VALUE
Which can be transformed to:
1 LOAD_CONST 1 (1)
STORE_FAST_LOAD_FAST_COMPACT 0 (a), 1 (b)
2 RETURN_VALUE
Unfortunately, having superinstructions straddling lines prevents jumping to the line in question in a debugger, as the stack is inconsistent, which rules out the transformation above.
However, it appears that the reduction in code size more than compensates.
which rules out the transformation above.
Does this mean that in general we have to consider the state of the value stack as an "observable side effect" when making optimizations in the compiler? That seems new (relative to my previous understanding of what was safe to optimize), and unfortunate.
Well, I would assume that jumping to a line would jump just prior to execution of every instruction on that line. It would be weird, I think, if a local variable load or store didn't happen after jumping to a line and and stepping through it.
In the example above, I would expect that if I had two lines foo = 42
and return foo
, jumping over the first line would return the prior value of foo
(or None
if unbound). You think that frame_setlineno
should be allowed to refuse the jump?
I'm not proposing anything in particular; frame_setlineno
is code I hadn't looked at before and didn't realize existed :) I've never used the capability to jump directly to a line in a debugger. So I'm just newly wrapping my head around the implications for optimization (and thinking through whether I know of any existing compiler optimizations that could already break this.)
It seems this does mean that LOAD_FAST
(or anything that pushes to stack) isn't generally safe to shift to a different line. So for example this seems to invalidate the proposed superoptimizer design in https://github.com/python/cpython/issues/102869
frame_setlineno
is code I hadn't looked at before and didn't realize existed :)
Lucky! ;)
Do we know anyone who uses this evil function?
Here's a third-party debugger that uses it: https://github.com/web2py/web2py/blob/7685d373474378e93132f8916145fb11f84cec71/gluon/contrib/dbg.py#L307. pdb also supports it (https://github.com/python/cpython/blob/e01b04c9075c6468ed57bc883693ec2a06a6dd8e/Lib/pdb.py#L1202).
Seems like we should seriously consider deprecating support for this feature though; it feels very rarely useful and very hard to support.
I mean, I'm pretty sure every Python debugger has a jump
command, so we'd probably hear from them first if we removed it.
If we convinced them to drop the functionality too, then maybe we'd find out how many users are really relying on it. There are at least a handful that have filed various bug reports for CPython over the years.
I'm stuck somewhere between "I can see how it could be useful" and "it's too much pain to maintain and doesn't always work"... though, I have admittedly used jump
commands in pdb
a few times since learning about them (mostly if I mess up my debugging session somehow and don't want to re-run everything, but also sometimes to take a different branch or something).
I know debuggers support it. But do we know of any people who use it?
Not counting people who use the feature via debuggers?
I mean, you can only do it from within a trace function, so probably just a handful of people writing toys like this:
import sys
def goto(line: int) -> None:
def jumper(frame, event, arg) -> None:
if event == "line":
frame.f_lineno = line
frame.f_trace = None
frame = sys._getframe(1)
frame.f_trace = jumper
sys.settrace(jumper)
# Prints "B" in an infinite loop:
goto(15) # 13
print("A") # 14
print("B") # 15
goto(13) # 16
print("C") # 17
There are already many complicated prohibitions in the f_lineno
setattr -- there is no guarantee that it will work. So I don't believe it ought to be used as a reason to skip line-fusing optimizations.
Additionally, stepping through code already frequently jumps back and forth in ways that are hard to grasp intuitively. Again, if optimizations cause the sequence of observed "line events" to change, that feels like a minor inconvenience at best. (And even though PEP 626 claims otherwise, its motivation feels arbitrary to me and its specification is rather vague: "Line events and the f_lineno
attribute should act as an experienced Python user would expect.")
Let's not remove jumping in debuggers, or other features, that make our work more ~difficult~ interesting.
Many features that prevent optimizations in the bytecode compiler can be handled in the tier 1 optimizer.
Many features that prevent optimizations in the tier 1 optimizer, can be handled in the tier 2 optimizer.
In the tier 2 optimizer any call to a builtin function is a potential deoptimization. A single call to PyObject_GetAttr
can do anything, so it makes almost no difference what features are available.
Superinstructions can be problematic, as they break the one-instruction-at-a-time model that instrumentation relies on, and are likely to cause problems in optimizers that make the same one-instruction-at-a-time assumption.
Instrumentation rewrites all superinstructions back to the simpler form. Super instructions also prevent specialization, so superinstructions cannot include any specializable instruction. https://github.com/faster-cpython/ideas/issues/584 proposes a specialization of
LOAD_CONST
, so that leaves onlyLOAD_FAST
andSTORE_FAST
to be combined.If we remove
LOAD_CONST
we have only:That is still quite a lot of instructions, dynamically, so we don't want to just remove them. Instead of removing them, we can combine them into a single instruction. Given that most locals will have a index in
range(16)
we combine the operations into a single instruction.There will be fewer
LOAD_FAST_LOAD_FAST_COMPACT
thanLOAD_FAST__LOAD_FAST
, but they should be a little faster (fewer memory reads). I would expect the performance to be about the same.