faster-cpython / ideas

1.67k stars 49 forks source link

Performing optimizations in the presence of `Py_DECREF()` #582

Open markshannon opened 1 year ago

markshannon commented 1 year ago

We want to be able to optimize larger regions of code than a single instruction, but we can't (at least not as effectively) if arbitrary code can run in the middle of region. Py_DECREF() can run arbitrary code.

We should change Py_DECREF() so that it cannot run arbitrary code. Currently Py_DECREF(op) calls _Py_Dealloc(op) which calls Py_TYPE(op)->tp_dealloc(op).

We can modify _Py_Dealloc(op) to check a flag to see if there is a safe dealloc. If the dealloc is safe then call it, otherwise add the object to a queue to be deallocated later. Presumably on the eval breaker check.

This is going to slow down _Py_Dealloc, but hopefully not by much, and certainly less than the speedups that will be enabled.


An alternative approach is to make all the decrefs explicit in the IR and move them to the end of the superblock. This is IMO more complex and also prevents carrying optimizations across superblock boundaries.

Original discussion: https://github.com/faster-cpython/ideas/discussions/402

gvanrossum commented 1 year ago

What's a safe dealloc though? Freeing a string or int, sure. But freeing a tuple has to ask the question recursively, because the tuple might contain something with a refcount equal to 1 whose dealloc is not safe; same for all containers (e.g. essentially all class instances).

markshannon commented 1 year ago

Freeing a tuple is safe because it only calls Py_XDECREF before freeing the memory. Py_DECREF is safe because it only calls safe functions, deferring unsafe ones.

gvanrossum commented 1 year ago

So the only unsafe deallocs are ones that can directly invoke Python code (not via DECREF)? I guess that would include all instances of Python classes, or perhaps only instances of Python classes that define or inherit a __del__ method.

brandtbucher commented 1 year ago

...or C extension types with tp_finalize, or tp_del, or a non-trivial tp_dealloc, or anything that supports weakrefs...

gvanrossum commented 1 year ago

Couldn't weakrefs be separately put on a queue that's processed later?

brandtbucher commented 1 year ago

Probably, yeah.

markshannon commented 1 year ago

Anything that's not "safe" goes on a queue for later. That's anything that needs finalizing, has any weakrefs, or just has an opaque tp_dealloc that can do anything. The tricky part is to check all that very quickly.

markshannon commented 9 months ago

Looking at the new stats for micro-ops, it looks like the micro-ops that do not contain _Py_DECREF except on known safe objects (like int) account for something like 80% of the execution count.

So this may not be so urgent, although still worth doing.