Open carljm opened 1 year ago
For context, for a long time there was a similar hook (added for the sake of Psyco way back in 2002!), but it was removed in Python 3.9 (https://github.com/python/cpython/issues/84127). But the implementation of frames has changed a lot since then.
That hook allowed overriding PyThreadState_GetFrame
entirely. That's more capable (because it allows the hook setter to track their frame stack entirely separately from the interpreter; I don't think Cinder needs that capability). But with today's implementation of frames, that form of hook wouldn't be sufficient, because e.g. sys._getframe
doesn't use PyThreadState_GetFrame
. If it did, it would have to walk the chain of PyFrameObject
to the one requested, materializing every frame along the way, which is a lot less efficient than what it does today, which is walk the stack of _PyInterpreterFrame
and only create a single PyFrameObject
for the requested frame.
The whole point of the _PyInterpreterFrame
is that it is cheap to create. So, in theory, Cinder should be able to use it and not worry about frame objects at all, as the VM would take care of them.
How could we make _PyInterpreterFrame
even cheaper, so that you can do that?
You'll need a link, so previous
has to remain.
The f_globals
and f_builtins
fields could be dropped, but that might hurt the performance of LOAD_GLOBAL
.
Alternatively we could restrict their use to the interpreter, so that Cinder could ignore them.
We do need a way to access the globals and builtins, though, so f_funcobj
seems useful.
f_locals
is the "slow" locals, so just needs to be NULL
ed.
prev_instr
is needed to find the line number. I guess we could allow it to be NULL, but setting it to NULL isn't much cheaper than setting to the start of the function.
There's a word of offsets and flags. They are introspected, so can be ignored.
That leaves the larger part of the frame: the locals and stack. The stack isn't visible outside of the interpreter, so there is no need to set the stack. That leaves the local variables. You'll need to store the values somewhere across calls (and there aren't many callee save registers left after the VM state), so why not put them in the frame? They can be kept in register between calls.
Yep, I'd already come to the same conclusion that a) we should just use _PyInterpreterFrame
in 3.12, and b) populating most of it shouldn't be a big issue for us. The locals are indeed the sticking point.
That leaves the local variables. You'll need to store the values somewhere across calls (and there aren't many callee save registers left after the VM state), so why not put them in the frame? They can be kept in register between calls.
This is an interesting suggestion. We may need to do some experimentation to see how much worse it would be than our current approach (we spill to C stack only values stored in caller-saved registers.) Off the top of my head there's a few components that could potentially make it worse: a) locality and code size effects of spilling to Python stack instead of C stack, b) needing to always spill all locals, not just those stored in caller-saved registers, c) needing to consult runtime metadata that maps registers to local indices, which we currently only have to consult when we deopt.
After looking at this a bit, I think it's clear that spilling locals to the _PyInterpreterFrame
around all calls is going to be too expensive to be feasible. It would be much better to find a solution that doesn't constrain an optimizer to always (on calls) maintaining locals (or ideally, anything else it doesn't need) in the _PyInterpreterFrame
.
I also think a global hook maybe isn't a great idea; it makes it hard for multiple optimizers to coexist.
A better idea might be to reuse f_funcobj
, since it's currently only needed internally to the interpreter, for fetching the function closure. So we could set that instead to a vector-callable object that knows how to fully update the _PyInterpreterFrame
, and then we'd just need _PyFrame_GetFrameObject
to do something like if (!PyFunction_Check(frame->f_funcobj)) { PyObject_Vectorcall(frame->f_funcobj, frame, 1, NULL); }
before going ahead and making the full frame object. Or if the PyFunction_Check
is too implicit, there's room for a few new bitflags on _PyInterpreterFrame
without actually making it bigger.
After looking more into this, it seems like we could be OK with having f_locals
on a JITted frame just be an empty dict. But there is one other problem: prev_instr
. That ends up responsible for f_lineno
on a frame, and we do need these to be correct. Updating prev_instr
on the frame before every call (and every instruction that can trigger arbitrary code execution) would not be ideal for performance. So I think we still don't see a good way around having some kind of "frame reification" hook :/
A JIT will likely not want to maintain the full state of a
_PyInterpreterFrame
(most importantly, stack/locals; maybe also prev_instr) during execution of JITted code. In general, it should be the JIT's responsibility to ensure the interpreter frame is in the correct state before it returns control to the interpreter.However, the functions
PyEval_GetFrame
andPyThreadState_GetFrame
, which return a fullPyFrameObject
for the current interpreter frame, are public C API.PyThreadState_GetFrame
is used internally by thecoroutine_origin_tracking_depth
feature and byPyErr_WriteUnraisable
.sys._getframe
,sys._getframemodulename
,sys._current_frames
, and construction of tracebacks also reify one or morePyFrameObject
from interpreter frames.All of the above ultimately use
_PyFrame_GetFrameObject
, which takes a_PyInterpreterFrame
and returns aPyFrameObject
.A JIT that does not constantly maintain the full state of the executing
_PyInterpreterFrame
will need some way to be notified when a fullPyFrameObject
is needed for one of its frames, so that it can populate the frame object appropriately based on its own internal execution state.I'd ideally like to find a solution here that will work both for Cinder JIT and for future faster-cpython JIT, even given their differing designs.
So, questions:
_PyFrame_GetFrameObject
.