capi-workgroup / decisions

Discussion and voting on specific issues
4 stars 1 forks source link

Make critical section API public as part of non-limited C API #26

Closed colesbury closed 1 week ago

colesbury commented 1 month ago

The critical section API provides macros for locking one or two Python objects at a time. I propose making the following macros public as part of Include/cpython (i.e., the non-limited C API):

These macros are defined, but no-ops, in the default build.

PR draft: https://github.com/python/cpython/pull/119353 Public header: Include/cpython/critical_section.h PEP 703 section: https://peps.python.org/pep-0703/#python-critical-sections

Note that the PR draft doesn't currently have user facing documentation. I'll add that, but I'm hoping to get a decision on the proposed API first.

Why is this useful?

C API extensions may need to add synchronization to be thread-safe without the GIL. In some cases, straightforward per-object locking can introduce deadlocks that were not present when running with the GIL. This API avoids that by implicitly suspending critical sections in places where the GIL would be released.

This makes it easier to add locking to extensions without introducing deadlocks.In the "nogil" fork, we used this to make https://github.com/gaogaotiantian/viztracer thread-safe, and we've also used this extensively within CPython.

Note that in some cases plain mutexes are still more appropriate: the Py_BEGIN/END_CRITICAL_SECTION API makes it easier to avoid deadlocks, but plain mutexes make it easier to reason about the invariant that the mutex ensures.

Why is this part of CPython?

The underlying implementation hooks in to the PyThreadState management, so it would not be possible to implement this outside of CPython.

encukou commented 1 month ago

Macros are only usable in C/C++. To lift that limitation, the underlying type & functions would need to be made public as well, and since we expect that non-C language wrappers re-implement the macros, the documentation should show what they expand to -- similar to Py_BEGIN_ALLOW_THREADS & Py_END_ALLOW_THREADS.

That complicates things, of course -- the current implementation requires the size of the PyCriticalSection* structs, which runs into the same issues as exposing PyMutex :(

colesbury commented 1 month ago

The functions are callable from Rust and other languages. The structs are two pointers (_PyCriticalSection) or three pointers (_PyCriticalSection2). I'm happy to designate the underlying functions as public.

Note that C API types are not really usable outside of C/C++ either. PyO3, for example, reimplements all the C structs in Rust.

The complaint with PyMutex was that it was one byte. These structs are larger than a single byte.

vstinner commented 1 month ago

I'm fine with these 4 macros.

It's ok that macros cannot be used in Rust and other programming languages, if _PyCriticalSection structure, _PyCriticalSection_Begin() and _PyCriticalSection_End() can be used (same for the _PyCriticalSection2 variant). Should we make these structures and functions public for these languages?

vstinner commented 1 month ago

These macros look similar to Py_TRASHCAN_BEGIN and Py_TRASHCAN_END. I like the fact that proposed "critical section" macros don't leak implementation details, but call opaque functions :-)

encukou commented 4 weeks ago

So, we want to expose:

(The actual expansion might be different, e.g. no-op in the default build, but these should be tested.)

Is that right?


Note that C API types are not really usable outside of C/C++ either. PyO3, for example, reimplements all the C structs in Rust.

Yup, that's why we want to avoid them where possible.

vstinner commented 4 weeks ago

The structs PyCriticalSection and PyCriticalSection2, whose contents are opaque but whose size/alignment is part of the ABI and may change in a new feature release

IMO members should be public. Trying to provide a stable ABI is too early, and trying to hide members would only make the implementation more complicated and fragile.

The issue title clearly excludes the limited C API:

Make critical section API public as part of non-limited C API

encukou commented 4 weeks ago

Right, I meant private rather than opaque (edited). The compiler will see them, users should not touch them directly.

colesbury commented 4 weeks ago

A few differences between what @encukou wrote and the current PR:

encukou commented 4 weeks ago

Yup, I do want to make them public, so that non-C languages can use them. I'd also like the functions to be exposed as actual DLL symbols (and shadowed with static inline functions for performance in C; I'm happy to implement that little dance). For non-C, I don't think we should make them opaque.

I edited the post to add the casts.

colesbury commented 4 weeks ago

I'd also like the functions to be exposed as actual DLL symbols...

The functions are exposed as actual DLL symbols in the linked PR. The inline functions aren't public (i.e., they are in pycore_critical_section.h).

There's a dance with redefining the Py_BEGIN_CRITICAL_SECTION, etc. macros for internal CPython usage where we want more things to be inlined.

https://github.com/python/cpython/pull/119353/files#diff-ecf78b1af8cff87b3ae3899029f7ba1223286415cc64ef2f0d5bcf58bb540cecR77-R83

vstinner commented 2 weeks ago

There's a dance with redefining the Py_BEGIN_CRITICAL_SECTION, etc. macros for internal CPython usage where we want more things to be inlined.

I suggest you to use a different name for the "fast inline" flavor. I did something similar with PyThreadState_GET():

vstinner commented 2 weeks ago

The https://github.com/capi-workgroup/decisions/issues/26#issuecomment-2142474336 API LGTM.

encukou commented 2 weeks ago

It's fine for me for non-limited C API as well.

But unlike PyMutex, I think we want to add this one to the limited API rather soon, and we should plan for that. So let me look at the aspects that tend to prevent evolution.

Exposed struct sizes

The usual solution for avoiding struct sizes is adding API to allocate the struct. But since there's already space for the pointer, we can (if it's ever needed) make PyCriticalSection_Begin/PyCriticalSection_End allocate/free without breaking users. (And then we can introduce a PyCriticalSection_Begin_v2 that takes a bigger struct and doesn't allocate, and switch the macros to use it.)

Or perhaps we can do what was proposed for PyMutex: inflate the struct size to give some room for expansion. Here, the struct is only used on the stack, rather than as part of a bigger structure that might need ABI compatibility with other extensions. If we prescribe that the structs must only be used as in the macros, the struct can be “just big enough” in version-specific ABI, but have some extra space when compiling for the stable ABI. (Possibly lots more space -- stack space is only expensive in recursion, but if you're nesting critical sections you're doing it wrong. But an extra pointer is probably enough.)

Error reporting

It seems that failure to enter/exit a critical section is not really recoverable, but generally it's good if functions are able to raise runtime deprecation warnings. In this case, I can live with the API being difficult to deprecate.

vstinner commented 2 weeks ago

The usual solution for avoiding struct sizes is adding API to allocate the struct. But since there's already space for the pointer, we can (if it's ever needed) make PyCriticalSection_Begin/PyCriticalSection_End allocate/free without breaking users. (And then we can introduce a PyCriticalSection_Begin_v2 that takes a bigger struct and doesn't allocate, and switch the macros to use it.)

Allocating memory on the heap means introducing the risk of a memory allocation failure. I don't think that we want that for a critical section.

vstinner commented 2 weeks ago

(Possibly lots more space -- stack space is only expensive in recursion, but if you're nesting critical sections you're doing it wrong. But an extra pointer is probably enough.)

We are trying to reduce the stack memory usage in Python. I don't think that adding an unused member "just in case" (for future usage) to the structure is worth it.

gvanrossum commented 2 weeks ago

To me, the possibility of PyMutex needing to expand in size seems too uncertain to worry about it. The basic lock functionality will not require extra space, ever. (If they ever do, they will have to invent a new type.)

The only reason to reserve extra space seems to be the idea of possibly having the lock contain a pointer to the "real" location of the lock, in case an object containing a lock is moved by a hypothetical future GC. But this feels like the wrong solution for that problem: It would make more sense for the GC to declare locked objects unmovable (there will always be reasons why a particular object can't be moved, so this would just be another reason). Also, the Stable ABI would have many other problems when GC starts moving objects, so it's also a straw man.

colesbury commented 1 week ago

It would be helpful to get a decision on this API.

It seems to me that people are supportive of the API @encukou described in https://github.com/capi-workgroup/decisions/issues/26#issuecomment-2142474336. There is still some uncertainty about future stable ABI considerations, but the current proposal does not affect the stable ABI/limited C API.

encukou commented 1 week ago

Yeah, let's do a formal vote for the API in https://github.com/capi-workgroup/decisions/issues/26#issuecomment-2142474336

zooba commented 1 week ago

This looks like an absolute nightmare to debug (it's the first time I've seen how these work), and unfortunately it's not going to solve the existing race conditions/vulnerabilities we have that I'd hoped per-object locks would help with, but the API (the macro names) are fine.

I don't see how this gets into the stable ABI at all in the future, though, nor can this be used reliably from other languages (for an arbitrary value of "other language"). If we want to go there, I think we'll need a design that doesn't rely on pointers to user-defined variables on the stack, and probably need to store it all in our thread state and return a key to the caller to store (this will be hidden in the macro for C devs, so their sources don't change).

vstinner commented 1 week ago

@colesbury: Congrats, your API got approved, I close the issue.

@zooba: Maybe some tooling can be added later to help debugging dead locks and race conditions. And yeah, for now, this API doesn't belong to the stable ABI, and I agree that a redesign may be needed if we want to make it happen later.