faster-cpython / ideas

1.67k stars 49 forks source link

Deoptimization of superblocks #563

Open markshannon opened 1 year ago

markshannon commented 1 year ago

Optimized superblocks will depend on two types of assumptions.

  1. Local ones that are checked by inline guards.
  2. Global assumptions that are not checked locally.

For example, we might assume that the class of an object has version 212. We would insert an inline guard to check that. We might also assume the builtin len has not be reassigned. It is expensive to check this all the time, so we don't want to have to put a guard in the optimized code. In which case we need a mechanism to invalidate the optimized dynamic superblock should len actually get changed.

We can use the mechanism of watchers to install a watcher, but then we need find all the superblocks to deoptimize. Designing an efficient and compact data structure that can perform this task will be important.

One possibility is to use bloom filters to avoid having to store all possible deoptimization events on the superblock. Deoptimizing a superblocks unnecessarily will impact performance, but not correctness. So, a few false positives should be fine.

The actual deoptimization is simple enough: just replace the start of the code with code that returns to the base interpreter.

Here's an initial list of possible deoptimization events: We won't need to worry about events until we have optimizations that assume those event won't happen.

Local events (will only deopt a few superblocks)

Global events (will deopt all superblocks)

@carljm Anything I've missed here?

carljm commented 1 year ago

That list looks pretty reasonable. Some other reasons we might deopt from the Cinder JIT that come to mind:

FWIW in Cinder we don't treat "change to builtins" as a global deopt event. (In fact we don't even have any concept of a global deopt event.) A load of a builtin is treated like any other global load, which results in installing watcher on both module globals dict and builtins dict. So if a key in builtins changes, it won't deopt everything everywhere, it'll just deopt at actual uses of that specific name.

The actual deoptimization is simple enough: just replace the start of the code with code that returns to the base interpreter.

It gets more complicated than that if there's any possibility of a deopt-triggering event occurring while an optimized superblock is already executing. So many things in Python can trigger arbitrary code execution, it's pretty hard to avoid this possibility.

In Cinder we currently don't ever invalidate an entire compiled function AFAIK; all our deopts occur at a specific point within generated code (which solves the "what if something happens while the generated code is already executing" problem.) Generated code may reference cached values outside of it and then deopt if that cached value has been invalidated. E.g. that's how LOAD_GLOBAL is currently handled, dict watcher clears a cached pointer that the generated code references, which will cause the generated code to deopt when it hits a load from that cache. In future we'd like to move to code patching instead so the cache is really inline.

tekknolagi commented 1 year ago

@carljm I think entire functions are invalidated by __code__ reassignment and maybe one other thing

markshannon commented 1 year ago

It gets more complicated than that if there's any possibility of a deopt-triggering event occurring while an optimized superblock is already executing. So many things in Python can trigger arbitrary code execution, it's pretty hard to avoid this possibility.

Yes. Which is why we need to throw away all inferred type information and knowledge of the VM if arbitrary code can be called. So, we really want to avoid that happening. Meaning that we need to do the following:

In cases where we cannot do the above, we have to treat the part of the superbock after the call as having no prior context, which reduces our ability to optimize it a lot. It also means we need to check to see if the superblock has been invalidated.