capi-workgroup / decisions

Discussion and voting on specific issues
5 stars 1 forks source link

Add unstable C API function for enabling deferred reference counting (`PyUnstable_Object_SetDeferredRefcount`) #42

Open colesbury opened 2 days ago

colesbury commented 2 days ago

In the free-threaded build, marking an object as supporting deferred reference counting allows the interpreter to skip reference count modifications for values pushed and popped from the interpreter's evaluation stack and local variables. This is important for avoiding contention on reference count modifications that inhibit scaling. See also PEP 703: "Deferred Reference Counting". The tradeoff is that these objects are only collected when the tracing GC runs.

In the free-threaded build, we currently have an internal-only function _PyObject_SetDeferredRefcount that allows an object to have deferred references to it. We use this on methods, top-level functions, modules, heap types, and a few other types of objects.

I think we should expose this as an unstable API for C extensions that "hints" the runtime to make this tradeoff:

By "hint", I mean that the runtime is free to ignore this suggestion. In particular, the the function is a no-op in the default build.

Motivation

Some C API extensions like nanobind create their own function and method-like objects. The nanobind.nb_func and nanobind.nb_method behave like functions and methods, but they are not subclasses of PyFunction_Type, PyCFunction_Type, or PyMethod_Type.

Without deferred reference counting (or immortalization) enabled, calling nanobind functions from multiple threads doesn't scale at all -- nanobind would not be very useful in the free-threaded build.

I think this is likely to come up in other extensions as well -- Peter's PR was created in response to a separate issue by @Yiling-J, for example.

Alternatives

Nanobind's support for free-threading currently relies on making nb_func and nb_method objects immortal in the free-threaded build immortal by basically copy-pasting _Py_SetImmortal. That's obviously not great - I'd much rather that nanobind is able to use an official, unstable API than to modify internals in a way that's likely to break in new CPython versions.

Note that in 3.13 deferred reference counting and immortalization are entangled due to time limitations, but this is no longer the case in main/3.14.

Python API?

I don't think we should expose this as a Python API, at least for now.

cc @wjakob, @ngoldbaum, @ZeroIntensity

Yiling-J commented 2 days ago

My motivation is to make my cache package scalable, but it's still unclear to me how method is expected to perform under free threading. Does this mean cache.get(key) is contended by default, and that I need to use the new C API to ensure scalability? I’ve already shared my concerns in the issue, and let me paste them here:

Python users might not realize the potential bottleneck when using free-threading Python. If they don't consult the C API documentation, they may be unaware that methods could become a scalability issue and need to be manually optimized using the C API.

ZeroIntensity commented 2 days ago

FWIW, your case is somewhat specific. A bound method object (client.get) is not something that's generally accessed across threads, so there's no DRC on them by default.

encukou commented 1 day ago

I suggest PyUnstable_Object_EnableDeferredRefcount; it's not setting the refcount number. (For a more stable function I'd suggest naming it after the intent rather than implementation, like _HintShared, but that's not too important at this point.)


I think it should be easier to add/remove reasons why deferred refcounting can't be enabled. Even for PyUnstable_ functions, it's nice if they survive intact for a few releases :)

Suggestion:


Keep in mind PyUnstable_ functions need docs and tests :)

vstinner commented 1 day ago

I think that it's reasonable addition, but I'm not sure about the API. I'm waiting to see a concrete API :-) I don't understand well when such call can fail.

colesbury commented 1 day ago

@encukou:

@vstinner: the linked PR (https://github.com/python/cpython/pull/123635) has a concrete API. The API only fails if the pre-conditions are not met:

1) Is the object tracked by the GC? The deferred reference counting implementation relies on the GC being able to find all deferred references, so it's important for correctness that the object is tracked by the GC. 2) Is it it thread-safe to modify the reference count? In the PR, this amounts to a check that the object is owned by the current thread and that the shared reference count is zero. That's basically a heuristic. As I'm writing this, I think that maybe we should instead try to make the API thread-safe in more cases by using atomic operations instead of trying to enforce this precondition.

@encukou: