Deoptimization of superblocks

markshannon commented 1 year ago

Optimized superblocks will depend on two types of assumptions.

Local ones that are checked by inline guards.
Global assumptions that are not checked locally.

For example, we might assume that the class of an object has version 212. We would insert an inline guard to check that. We might also assume the builtin len has not be reassigned. It is expensive to check this all the time, so we don't want to have to put a guard in the optimized code. In which case we need a mechanism to invalidate the optimized dynamic superblock should len actually get changed.

We can use the mechanism of watchers to install a watcher, but then we need find all the superblocks to deoptimize. Designing an efficient and compact data structure that can perform this task will be important.

One possibility is to use bloom filters to avoid having to store all possible deoptimization events on the superblock. Deoptimizing a superblocks unnecessarily will impact performance, but not correctness. So, a few false positives should be fine.

The actual deoptimization is simple enough: just replace the start of the code with code that returns to the base interpreter.

Here's an initial list of possible deoptimization events: We won't need to worry about events until we have optimizations that assume those event won't happen.

Local events (will only deopt a few superblocks)

Instrumentation of a single code object
Function version change
Type version change (including cached keys)
Watched dict value changes
__class__ of object changes (We might be able to merge this with type version change)

Global events (will deopt all superblocks)

Global instrumentation
Changes to the builtins dict.

@carljm Anything I've missed here?

carljm commented 1 year ago

That list looks pretty reasonable. Some other reasons we might deopt from the Cinder JIT that come to mind:

anytime an exception is raised
type guard fails (value isn't of expected type) -- this is currently hardcoded in some specific places (e.g. we'll deopt if UNPACK_SEQUENCE sees something that isn't a list or tuple, because in practice that's rare enough that it's not worth generating code for it), but we're moving towards type profiling and optimistic codegen, so there'll be more of these handled in a generic way
probably more that I'm not thinking of right now :)

FWIW in Cinder we don't treat "change to builtins" as a global deopt event. (In fact we don't even have any concept of a global deopt event.) A load of a builtin is treated like any other global load, which results in installing watcher on both module globals dict and builtins dict. So if a key in builtins changes, it won't deopt everything everywhere, it'll just deopt at actual uses of that specific name.

The actual deoptimization is simple enough: just replace the start of the code with code that returns to the base interpreter.

It gets more complicated than that if there's any possibility of a deopt-triggering event occurring while an optimized superblock is already executing. So many things in Python can trigger arbitrary code execution, it's pretty hard to avoid this possibility.

In Cinder we currently don't ever invalidate an entire compiled function AFAIK; all our deopts occur at a specific point within generated code (which solves the "what if something happens while the generated code is already executing" problem.) Generated code may reference cached values outside of it and then deopt if that cached value has been invalidated. E.g. that's how LOAD_GLOBAL is currently handled, dict watcher clears a cached pointer that the generated code references, which will cause the generated code to deopt when it hits a load from that cache. In future we'd like to move to code patching instead so the cache is really inline.

tekknolagi commented 1 year ago

@carljm I think entire functions are invalidated by __code__ reassignment and maybe one other thing

markshannon commented 1 year ago

It gets more complicated than that if there's any possibility of a deopt-triggering event occurring while an optimized superblock is already executing. So many things in Python can trigger arbitrary code execution, it's pretty hard to avoid this possibility.

Yes. Which is why we need to throw away all inferred type information and knowledge of the VM if arbitrary code can be called. So, we really want to avoid that happening. Meaning that we need to do the following:

Don't let decref's run arbitrary code. Deallocation is fine. Finalizers are not, and because C code is opaque, it means we need to treat most C deallocation functions as potential finalizers.
Mark as many common builtin functions as possible as being pure or, at least, having no global side effects.
Inline calls to Python functions, so that we don't lose context.

In cases where we cannot do the above, we have to treat the part of the superbock after the call as having no prior context, which reduces our ability to optimize it a lot. It also means we need to check to see if the superblock has been invalidated.

faster-cpython / ideas

Deoptimization of superblocks #563

Local events (will only deopt a few superblocks)

Global events (will deopt all superblocks)