Handling unlikely events with de-optimization in tier 2.

markshannon commented 5 months ago

In order to optimize tier 2 superblocks, we want to be able to use existing techniques and use straightforward reasoning. Unfortunately this is impossible due to a number of Python language, and CPython VM, features that make optimization very difficult, if not impossible.

For example

def f(y):
    x = y
    ...

One might presume that from ... onward, x and y would refer to the same object, but that might not be the case if x were modified in a debugger in another thread. This is extremely unlikely to occur, but it is possible.

We need a scheme that allows us to perform useful optimizations and retain correctness when these unlikely events occur.

The general idea

We assume no "unlikely" event can occur when optimizing, and optimize freely. If such an event does occur, we throw away our optimizations.

Local events

Most classes and modules are not mutated during their lifetime, but some are. We would like to optimize as if there were immutable, but retain correctness if they are mutated. Throwing away all our optimizations whenever a class or module was mutated would be very expensive, so we need a compromise. What we can do, is track which executors depend on a given class or module and only throw those away.

The events

Global events

Setting an object's class, obj.__class__ = ...
Setting the bases or mro of a class, cls.__bases__ = ...
Setting the PEP 523 frame eval function, _PyInterpreterState_SetEvalFrameFunc().
Changing a module's dictionary, mod.__dict__ = ...
Function modification, setting the defaults, closure, etc. func.__closure__ = (None, None)
Modification of locals in a debugger
staticmethod, classmethod and property can be mutated by calling __init__ again

Local events

Setting a class attribute, cls.attr = ...
Setting a global variable, global x; x = ...
Instrumentation, sys.monitoring.set_events()

The above classification is not fixed, there is no hard rules about what is global, or locals. Globals events are cheaper to optimize, but more expensive to de-optimize, so they must be very rare.

Handling de-optimization and re-optimization

If we naively throw away all executors after a global event, and then carry on as normal, we could easily find ourselves optimizing, then de-optimizing repeatedly with a dire effects on performance.

So we need to avoid that. For each event we need a way to avoid that. First of all we keep a global (per-interpreter) set of flags to indicate how many times the event has occurred. If an event has occurred N or more times we change our optimization strategy to assume the event may happen, and do not de-optimize if it does.

Event	N	Note
`obj.__class__ = ...`	1
`cls.__bases__ = ...`	1
PEP 523	1
Modifying a builtin	4	(builtins can get modified during startup, so we need to tolerate that)
Function modification	1
Setting a local in debugger	infinite	(this is a slow event, so the cost of re-optimization is acceptable)
`cls.attr = ...`	8	per-class
Setting a global	8	per-module
Instrumentation	infinite	(instrumentation is set infrequently and prevents optimization locally, so we want to optimize around it)
staticmethod/classmethod/property mutation	1

The values for N above are provisional and are likely to change.

Free threading

To support free-threading, most of the events above will be stop-the-world events. Once the modification threshold is reached, then those events will no longer stop the world, as the event will no longer invalidate optimizations.

markshannon commented 5 months ago

@cfbolz I'd be curious to know what you think. How does PyPy handle these things?

gvanrossum commented 5 months ago

Changing a module's dictionary, mod.__dict__ = ...

AFAICT this is not possible. See the top of moduleobject.c:

static PyMemberDef module_members[] = {
    {"__dict__", _Py_T_OBJECT, offsetof(PyModuleObject, md_dict), Py_READONLY},
    {0}
};

mdboom commented 5 months ago

Changing a module's dictionary, mod.__dict__ = ...

AFAICT this is not possible. See the top of moduleobject.c:

I wonder if @markshannon meant mutating the module dictionary instead. That is probably relatively rare after import and initialization, but of course determining exactly when that phase ends is tricky.

markshannon commented 5 months ago

I did mean mod.__dict__ = ... That's one less thing to worry about. I'm sure we'll think of more.

cfbolz commented 5 months ago

Hi @markshannon,

yep, I think all this makes sense. I'll provide some details below, happy to answer further questions (also in a call, if that would be helpful).

What we can do, is track which executors depend on a given class or module and only throw those away.

That's exactly what we do, yes. Every trace depends on the state of some classes, modules, functions and will be optimized based on the assumption that these objects are immutable. In turn, classes/modules/functions keep a list of traces that need to be invalidated once they get mutated.

We have turned this into a very general mechanism built directly into RPython (called 'quasi-immutability'). Therefore it's very cheap for us to add it to a bunch of other random places in the interpreter (but I think you can be very incremental about that, the gains of these are really marginal). Two examples for rarer language features:

staticmethod/classmethod/property, they are mutable by calling __init__ again)
whether we have ever started a second thread. a lot of scripts never do and in that case we don't have to do tick counter manipulation at all, which helps really tight loops

The events

I'm not exactly sure what the difference between "global" and "local" events are?

If we naively throw away all executors after a global event, and then carry on as normal, we could easily find ourselves optimizing, then de-optimizing repeatedly with a dire effects on performance.

This has happened to PyPy a bunch of times, so it's good to watch out for this problem. We basically had/have N=infinity for all of these, and then a bunch of safety mechanisms to catch the worst problems of that approach (and sometimes we miss one). Getting the balance right is hard though, because there are always programs that do really weird meta-programming at startup time and would like to execute quickly afterwards.

One difference that I wanted to mention is that for modules and class dicts we track mutations on a per-key basis. It's quite common to have a counter in a class variable or in a global. But the remaining class attributes/globals should still be optimized. What we do is give up on an attribute on the first mutation. Afterwards, that attribute will not be considered to be constant any more by the JIT (but other attributes in the same class/module will be, until they are mutated). Once an individual class attribute/global has been mutated, we still remember where in the dict it is found in the trace, so that looking it up still does not require a full hashmap lookup.

markshannon commented 5 months ago

Thanks.

I've added staticmethod mutation to the list.

I'm not exactly sure what the difference between "global" and "local" events are?

By "global" events, I mean that we de-optimize everything. For "local" events we de-optimize only those executors that are invalidated.

One difference that I wanted to mention is that for modules and class dicts we track mutations on a per-key basis.

Tracking modifications per key would be nice, but it is not obvious how to do it without some major surgery on dicts.

cfbolz commented 5 months ago

By "global" events, I mean that we de-optimize everything. For "local" events we de-optimize only those executors that are invalidated.

Right. The only event that is currently listed as global that I would be would be somewhat worried about is obj.__class__ = .... People seem to do that with some regularity :woman_shrugging:. Wouldn't it be enough to treat it like a mutation of the old class of obj? Or maybe both involved classes?

Tracking modifications per key would be nice, but it is not obvious how to do it without some major surgery on dicts.

Yep, we did major surgery on dicts to support it.

faster-cpython / ideas