faster-cpython / ideas

1.67k stars 49 forks source link

Store both function and code object in the function version cache #665

Open gvanrossum opened 3 months ago

gvanrossum commented 3 months ago

The idea here is to avoid function version cache misses for generator expressions. (See https://github.com/faster-cpython/ideas/issues/664#issuecomment-2000948111.)

We have a complicated mechanism to reset the function version whenever __code__, __defaults__ and a few other function attributes are mutated. (BTW: nothing is affected by changes to __annotations__, and yet that is also considered a mutation.)

Why not instead just reset the function version to zero and stick to that? We then guarantee that the function version is either zero or matches the code object version.

Nothing changes for specialization except that _PyFunction_GetVersionForCurrentState() returns 0 for mutated functions. This is unlikely to affect any benchmark or other perf-critical real-world code.

The function version cache would double in size, and store both the function and the code object. When a function is deallocated or its version is reset to zero, it evicts itself from the cache, but keeps the code object. Code objects remove themselves from the cache when deallocated (and probably also evict the function object).

For Tier 2, when generating _PUSH_FRAME or _POP_FRAME, we can handle the case where the function version maps to a pair (NULL, some_code_object) -- we store NULL in the operand, but we use some_code_object to trace through. Globals removal will no longer work (boo hoo), but at least we still have a trace. Presumably at least some generator expressions don't use globals (they can still use builtins, which can be reached without the function object).

gvanrossum commented 3 months ago

I will attempt a prototype implementation to see whether this is feasible, and if it is, I will create a cpython issue and PR.

JelleZijlstra commented 3 months ago

Linking to python/cpython#109998 where I found that we're inconsistent about whether writing to certain function attributes bumps the function version. I didn't do anything about it because I don't understand this code well and it didn't seem likely to matter in practice, but if you're redoing the way function versions are handled, it might be worth also fixing this inconsistency.

gvanrossum commented 3 months ago

It looks feasible, see https://github.com/python/cpython/pull/117028. I have created https://github.com/python/cpython/issues/117045 to debate a better internal API.