Make `PyMutex` public in the non-limited API

colesbury commented 2 months ago

Links

CPython Issue: https://github.com/python/cpython/issues/117511 PR draft: https://github.com/python/cpython/pull/117731 Additional discussion: https://discuss.python.org/t/make-some-pymutex-apis-public/50203

Overview

The PyMutex APIs are currently internal-only in pycore_lock.h. This proposes making the type and two functions public as part of the general, non-limited API in Include/cpython/lock.h.

The APIs to be included in the public header are:

typedef struct PyMutex { ... } PyMutex;
static inline void PyMutex_Lock(PyMutex *m) { ... }
static inline void PyMutex_Unlock(PyMutex *m) { ... }

Additionally, the out-of line of "slow path" functions are exposed in the header, but not part of the public API. (For context, these are called when PyMutex_Lock needs to block and when PyMutex_Unlock needs to wake up another waiting thread).

// (private) slow path for locking the mutex
PyAPI_FUNC(void) _PyMutex_LockSlow(PyMutex *m);

// (private) slow path for unlocking the mutex
PyAPI_FUNC(void) _PyMutex_UnlockSlow(PyMutex *m);

encukou commented 2 months ago

+1 from me.

This does introduce a non-opaque struct, which is a red flag worth focusing on. The practical issue with such a struct is that it's part of the ABI, so

we can't change it in a bugfix/security release
if this later makes it to the stable ABI (a possibility I don't want to exclude), then we can't ever change it: the whole API would need to be deprecated and replaced with a new one. But with such minimal API, even that would be bearable.

There's some mind-blowingly genius engineering behind the "one-byte struct and two functions" API surface. But the idea is proven; it's been used in WebKit since 2015. I don't think it'll need to change any time soon.

The struct member name should not appear in user code, so please mark it as private using a leading underscore:

typedef struct {
    uint8_t _v;
} PyMutex;

Note that this requires atomics -- the C11 optional feature, or the MSVC flavour. (The GCC flavour is an optional speedup.) We need to (belatedly) update PEP-7 to allow ourselves to use atomics.

gvanrossum commented 2 months ago

Yeah, makes sense.

vstinner commented 2 months ago

The static inline function calls the private _Py_atomic_compare_exchange_uint8() function. I would prefer to clarify _Py_atomic_compare_exchange_uint8() visibility first: I would like it to be public.

I would prefer to make the whole big Py_atomic C API public in Python 3.14, and then add PyMutex. IMO it's really hard to write efficient C code for the Free Threading build (PEP 703) without having this C API available. And I would prefer that people don't start to consume the private API, otherwise, once again, "private" will mean "public" :-(

Before Py_atomic C API is public, maybe we can start by having an opaque PyMutex implementation in the stable ABI:

function to allocate/release a PyMutex on the heap memory: I don't want to expose the structure. Every time I added a structure to the C API, I regret :-(
opaque function working on a PyMutex*: Lock/Unlock.

Later, once the static inline will be added, if Py_LIMITED_API macro is not defined, you would use the same API, but use the faster static inline functions.

encukou commented 2 months ago

Current C compilers support C11/C++ atomics; other languages have their own locking. I don't think we want to add the wrappers as any kind long-term-support API; definitely not as dozens of type-specific functions. Third-party code should simply #include <atomic> and use atomic_compare_exchange_strong((_Atomic(uint8_t)*)obj, expected, desired). If their compiler doesn't support that, they should update and/or use experimental:c11atomics for MSVC. If that turns out to be problematic, let's spin the wrappers off into their own library. (Steve had that idea for PyTime_*Raw, but here it would make much more sense to me: it'd be much more that 3 functions, and it'd be expected to eventually go away as compiler C11 support improves.)

function to allocate/release a PyMutex on the heap memory

The point here is to avoid memory allocation overhead (both time and space) for the one byte.

If it helps: I see PyMutex as a name for uint8_t -- a stronger typedef, so to say. Effectively, this proposal is adding:

static inline void PyMutex_Lock(uint8_t *v) { ... }
static inline void PyMutex_Unlock(uint8_t *v) { ... }

except we want to prevent users from doing arithmetic on v.

vstinner commented 2 months ago

Current C compilers support C11/C++ atomics; other languages have their own locking. I don't think we want to add the wrappers as any kind long-term-support API; definitely not as dozens of type-specific functions.

How does someone use PyMutex in C++, Rust or any other language if _Py_atomic_compare_exchange_uint8() is not available for them? Knowning that it's implemented as a static inline function.

Third-party code should simply #include and use atomic_compare_exchange_strong((_Atomic(uint8_t)*)obj, expected, desired).

I don't think that atomic_compare_exchange_strong((_Atomic(uint8_t)*)obj, expected, desired) is code which is easy to use for C beginners.

I'm not sure that I understand. Do you suggest that people don't use the PyMutex API if _Py_atomic_compare_exchange_uint8() is not available for them? So PyMutex would only be usable in C with a C compiler supporting _Py_atomic_compare_exchange_uint8()?

If that turns out to be problematic, let's spin the wrappers off into their own library.

This is something new. I would prefer to not go this way. It looks very complicated.

davidhewitt commented 2 months ago

How does someone use PyMutex in [...] Rust

Unless there's a case where Rust code needs to lock / unlock a PyMutex defined in other objects, I don't think Rust code needs to use it. We've got mutexes and atomics in the Rust standard library, and there's also the portable_atomic crate for platforms not supported by the standard library. So I think Rust code will almost always have atomics available, and will use Rust mutex types most (all?) of the time.

colesbury commented 2 months ago

For C++, you just call the same code as C: PyMutex_Lock(&m). It works in both C and C++.

I think the Rust situation is pretty different. In addition to what @davidhewitt wrote, mutexes in Rust protect specific data, so a mutex with no associated data is an awkward fit. If a mutex that releases the GIL when blocking makes sense, then I think PyO3 could implement it without too much difficulty.

vstinner commented 2 months ago

@davidhewitt:

Unless there's a case where Rust code needs to lock / unlock a PyMutex defined in other objects, I don't think Rust code needs to use it.

If Rust access an object which has a lock implemented as PyMutex, it should use the PyMutex_Lock() API while accessing the object, no?

For example, PyWeakReference has a struct _PyMutex *weakrefs_lock; member and the implementation uses PyMutex_LockFlags(wr->weakrefs_lock, _Py_LOCK_DONT_DETACH) and PyMutex_Unlock(wr->weakrefs_lock) to access the object. I'm not sure if it's the best example, since lock/unlock is currently hidden in public C API to access weak references.

davidhewitt commented 2 months ago

Is that weakref access implemented entirely in static inline functions? If so, then yes I think Rust code will need to reproduce the implementation (probably using Rust atomics).

encukou commented 2 months ago

We can additionally provide PyMutex_Lock as an actual exported function; non-C languages get some function call overhead. (We do that a lot in the limited API, which is more friendly to non-C languages at the expense of performance, and it's working out rather well. We can do it in the non-limited API as well. See a recipe here.)

If we do expect people to re-implement these in other languages, rather than call them, then their body needs to be documented and stable. That's a can of worms, but, if people will do it anyway we might want to set reasonable rules and stability expectations.

I'm not sure that I understand. Do you suggest that people don't use the PyMutex API if _Py_atomic_compare_exchange_uint8() is not available for them? So PyMutex would only be usable in C with a C compiler supporting _Py_atomic_compare_exchange_uint8()?

No, there I wasn't talking about the PyMutex API, but about your proposal to make _Py_atomic_compare_exchange_uint8 public. Since the function is static inline, it is only usable in C, even if we make it public. (Unless you want to export _Py_atomic_compare_exchange_uint8 as a regular function too? I'm also not sure that I understand your proposal.)

colesbury commented 2 months ago

I would really like for a decision to be made on this proposal

gvanrossum commented 2 months ago

I'm okay with whatever @encoku says.

encukou commented 2 months ago

And the others?

[x] @gvanrossum (see above comment)
[x] @encukou (+1)
[ ] @vstinner
[ ] @zooba
[x] @erlend-aasland

zooba commented 2 months ago

Main change I'd like to see is replacing the explicit initialization ((PyMutex){0}) with a function call - probably a PyMutex_Init(&mutex)[^1] and PyMutex_Destroy(&mutex) - just to make sure we have an API that can be updated later without affecting anyone already using it. Destroying a mutex has also proven to be valuable as a debugging aid to make anything currently waiting fail.

And unless there's an important reason otherwise, specify the public PyMutex as being pointer sized rather than a single byte. (I can imagine wanting to fill a compact array with a lot of mutexes, but that doesn't actually seem likely?) That allows us some freedom for future backwards-compatible changes that may involve more state. Having an internal interface that is only one byte is fine - it's just the public definition I'm concerned about, and as this is statically allocated (potentially cross-language), knowing that we've always got a void * available without callers recompiling is a nice thing to have when change happens.

[^1]: Rather than an assignment, since we can't really be assured a copy will always be possible.

colesbury commented 2 months ago

I am not in favor of any of these suggestions. The pointer size means a completely different type and different API. In other words, a different proposal.

Regarding PyMutex_Init, you need a way to initialize static mutexes. The equivalent of PTHREAD_MUTEX_INITIALIZER. So you're still limited to constant initialization. Why would you want to switch to any constant initializer other than zero?

Practically, most mutexes are either global (i.e., statically initialized to zero) or instances of some struct. For subtypes of PyObject, the typical initialization in extensions already zero initialize fields so any PyMutex field does not need any explicit initialization.

We should not make the API more complicated just for some unlikely hypothetical changes. Note that PyThreadType_lock already supports these hypotheticals, but it's not at all suitable for these use cases.

zooba commented 2 months ago

Okay, I'll concede zero initialization (and no destruction), but I still worry about having a public, non-opaque API based around a single byte. The API doesn't change in that case - we only have to use a single byte in the struct for what we currently want - in fact, I'd settle for the struct remaining at one byte but the specification saying to ensure you have sizeof(void *) available at the address of your PyMutex.[^1] Unless you happen to be carefully packing things, there's a good chance of ending up with that much padding anyway.

Can you expand a little on why PyThreadType_lock is not at all suitable? Is that due to being a larger size than 1? Or some other reason?

[^1]: I'm thinking of non-C callers here, who aren't using our PyMutex definition directly when they allocate it.

vstinner commented 2 months ago

Regarding PyMutex_Init, you need a way to initialize static mutexes. The equivalent of PTHREAD_MUTEX_INITIALIZER.

So there is a API for that? Can you add an API for static init?

colesbury commented 2 months ago

Can you expand a little on why PyThreadType_lock is not at all suitable?

To be clear, I mean that it would not be suitable for some of the current uses of PyMutex in CPython. We have a PyMutex per-PyObject. The API of PyThreadType_lock requires memory allocation (it returns a pointer). I think an extra memory allocation and deallocation for every PyObject created would be prohibitively expensive. The 1 byte vs. pointer size is important, but not as important as avoiding the allocation.

I'm thinking of non-C callers here, who aren't using our PyMutex definition directly when they allocate it

I'm not opposed to providing extra guidance for something like Rust bindings, but I don't know what concretely we can say or what actions we expect them to take. PyO3 could make their PyMutex definition pointer sized, but I don't see how that would help us.

I wrote this in the other issue, but the primary ways of extending PyMutex is in the private wait entry API. We can add (or remove) as many fields as we want to that in the future without backwards compatibility concerns. Internally, there's a hash table that maintains a mapping of PyMutex address to struct mutex_entry* for mutexes with threads waiting on them.

Unless you happen to be carefully packing things, there's a good chance of ending up with that much padding anyway.

Yes, that's true -- the bigger problem is that having more than one size or definition of PyMutex is problematic. (And in a few cases the 1 byte size matters.)

colesbury commented 2 months ago

So there is a API for that? Can you add an API for static init?

No, I was not suggesting an API for static init. I was trying to point out that if you want an API for initialization you need at least two functions/macros.

zooba commented 2 months ago

Internally, there's a hash table that maintains a mapping of PyMutex address to struct mutex_entry* for mutexes with threads waiting on them.

This is an important point to highlight in the API - the address has to be stable (in effect, the mutex is the address of the mutex, not merely the value).

the bigger problem is that having more than one size or definition of PyMutex is problematic. (And in a few cases the 1 byte size matters.)

What would be wrong with a definition like this:

struct PyMutex {
    union {
        uint8_t _v;
        void *_reserved;
    };
};

We still get our single byte, it's going to be as stably in the same position as anything else we could define, and the ABI is now immune to us adding up to three additional bytes for every supported platform, or changing the value of the mutex to a pointer (e.g. pointing directly into our own table in order to make the mutex value-based rather than address-based).

colesbury commented 2 months ago

This is an important point to highlight in the API - the address has to be stable...

This is the same as other locking APIs, the type is just smaller: the functions take a PyMutex* not a PyMutex. Similarly, pthreads has pthread_mutex_t (a struct) and pthread_mutex_lock takes pthread_mutex_t*. You can't copy a pthread_mutex_t or a Windows CRITICAL_SECTION and expect things to work.

The pthread_mutex implementation in glibc has the same behavior because internally there's some int whose address is passed to the Linux futex system calls.

What would be wrong with a definition like this?

It's incompatible with the one-byte internal definition and doesn't adhere to the "one definition rule." I'd expect things to probably work out okay, but it doesn't seem to me like a good enough reason deviate from standard compliant C.

I don't think we will want a pointer/table based mutex, but if, hypothetically we wanted that, we should just add a new API instead of trying to repurpose existing APIs.

EDIT: The "one definition rule" might not be the relevant rule, but trying to use the same functions on differently defined types seems fishy even if it works in practice.

zooba commented 2 months ago

This is the same as other locking APIs ...

The point is that it's different from most of our APIs, so it should be highlighted (or avoided, but I assume making it an explicit part of the API is preferable).

I don't think we will want a pointer/table based mutex, but if, hypothetically we wanted that, we should just add a new API instead of trying to repurpose existing APIs.

Sure, whatever.

I'm getting sick of arguing in favour of API stability on every issue. If nobody else from the WG wants to work for it at the early stages, then consider me abstained from this vote.

vstinner commented 2 months ago

Practically, most mutexes are either global (i.e., statically initialized to zero) or instances of some struct. For subtypes of PyObject, the typical initialization in extensions already zero initialize fields so any PyMutex field does not need any explicit initialization.

_PyObject_GC_New() does not initialize allocated memory to zeros. Many types use _PyObject_GC_New().

I would prefer to have a macro and a function to initialize a mutex. Something like:

#define PyMutex_STATIC_INIT (PyMutex){0}

void PyMutex_Init(PyMutex *mutex)
{
    *mutex = PyMutex_STATIC_INIT;
}

static PyMutex mutex = PyMutex_STATIC_INIT;

PyObject* new_object(void)
{
    MyObject *obj = PyObject_GC_New(MyObject, MyType);
    // obj->mutex = PyMutex_STATIC_INIT;
    PyMutex_Init(&obj->mutex);
    return (PyObject*)obj;
}

vstinner commented 2 months ago

typedef struct PyMutex { ... } PyMutex;
static inline void PyMutex_Lock(PyMutex *m) { ... }
static inline void PyMutex_Unlock(PyMutex *m) { ... }

Can you please expand the ...? I also would like to decide on the implementation. For example, if the only PyMutex member is public or not.

colesbury commented 1 month ago

Please see the PR linked above for the actual code:

https://github.com/python/cpython/pull/117731/files#diff-dda8837a4a7c6d719395237e17bca7e6928a75df70b8b1717686c35d1dd80349

colesbury commented 1 month ago

The below suggested pattern will not compile in MSVC. It relies on a GCC extension.

#define PyMutex_STATIC_INIT (PyMutex){0}
static PyMutex mutex = PyMutex_STATIC_INIT;

encukou commented 1 month ago

I don't think API stability is an issue. Changing the PyMutex size will break ABI, not API (i.e. we can do it a new feature release).

AFAIK, the dynamic-allocation API for locking is PyThread_allocate_lock etc. (Sadly undocumented, but public & stable.)

Perhaps we'll get another non-C-language-friendly, ABI-stable mutex as free-threading matures. (Maybe as a VM-managed part of a PyObject to make dynamic allocation worthwhile, with lock and unlock API on the object.)

This proposal is exposing the low-level machinery, for cases where you're counting bits and instructions. Would making it PyUnstable_ work for everyone?

erlend-aasland commented 1 month ago

Note also that the size of the mutex is included in the specification of the accepted PEP 703.

zooba commented 1 month ago

This proposal is exposing the low-level machinery, for cases where you're counting bits and instructions.

Last time I discussed this point with Sam, it sounded like there's no interaction between our internal locking and this API. So it's a fundamentally separate API, and the only reason to expose low level machinery is for our own convenience.

I'd rather it be a stable and more robust API[^1] and not mark it "unstable", and if we need something more efficient internally then we can have our own version.

If external users need to directly interact with our internal locks (e.g. by allocating a PyMutex themselves which will be handled by our internal machinery that isn't aware whether it is ours or theirs), this all changes. But as I said, that doesn't appear to be the case here.

[^1]: Or ABI, if you don't consider "initialize this public, non-opaque struct manually" to be API.

colesbury commented 1 month ago

Last time I discussed this point with Sam, it sounded like there's no interaction between our internal locking and this API.

Yes -- the per-object locking is intended to be done by the critical section API[^1] and other internal locks are not currently exposed. The caveat is that PyMutex ob_mutex is exposed in the PyObject header in the free-threaded build.

So it's a fundamentally separate API...

There's a logical leap here that I don't agree with. The proposal is that we expose the same PyMutex API, not write a separate public mutex API.

The concern seems to be that if we needed to change the size of PyMutex than we'd have to introduce another mutex API, but I think you are suggesting introducing a another mutex API immediately (instead of making PyMutex public), which seems like something we could just do later in the unlikely situation that size of PyMutex is not appropriate.

... the only reason to expose low level machinery is for our own convenience.

I think it's a lot more likely that we would want to expose some internal lock in the future than that we would need to change the size of a PyMutex.

[^1]: This API is not currently public. I was hoping to figure out the PyMutex before discussing the critical section API.

zooba commented 1 month ago

we'd have to introduce another mutex API

Yeah, this is the thing I want to avoid. An API should be the abstraction, not the implementation. The current proposed API is so close to the implementation that there's no way to preserve the abstraction at all if the implementation ever changes.

The proposal is that we expose the same PyMutex API, not write a separate public mutex API.

I get this, but it's not a motivation. The proposal ought to be "changes to CPython will force our users to need a mutex API that is as portable and consistent as the rest of the CPython API, so that they don't need to locate another portable library" - in other words, user-centric, rather than implementation-centric.

"It's a good time to add this because we have an existing implementation" helps, but we don't allow it to dictate the design. So I get that you want to take something suitable for internal use and just make it public, but that's an irresponsible way to design an API that we want users to be able to trust.

vstinner commented 1 month ago

The below suggested pattern will not compile in MSVC. It relies on a GCC extension.

In that case, can we implement it in a portable way? Does the following code work on most C11 compilers?

#define PyMutex_STATIC_INIT (PyMutex){._bits = 0}
static PyMutex mutex = PyMutex_STATIC_INIT;

vstinner commented 1 month ago

Please see the PR linked above for the actual code: https://github.com/python/cpython/pull/117731/files#diff-dda8837a4a7c6d719395237e17bca7e6928a75df70b8b1717686c35d1dd80349

The structure member is private, good:

typedef struct PyMutex {
    uint8_t _bits;  // (private)
} PyMutex;

PyObject members names start with "ob", PyTypeObject members names start with "tp", etc. In my experience, it's convenient for code navigation and other stuff. I suggest to rename the member to add a prefix. For example:

typedef struct PyMutex {
    uint8_t _mutex_bits;  // (private)
} PyMutex;

colesbury commented 1 month ago

Does the following code work on most C11 compilers?

It works in GCC and Clang but not MSVC. I think you would need something like the following:

#define PyMutex_STATIC_INIT { 0 }
static PyMutex mutex = PyMutex_STATIC_INIT;

We shouldn't do (PyMutex){._bits = 0} because it's (weirdly) treated like it's not a constant in C11. (It's also not C++11 compatible)
We shouldn't do { ._bits = 0 } either because it's not compatible with C++11 and our public headers should be C++ compatible. (I think it will work in C++20, which added designated initializers)

colesbury commented 1 month ago

The current proposed API is so close to the implementation that there's no way to preserve the abstraction at all if the implementation ever changes.

I don't agree with this characterization -- we would only need to change the API if we need to change the size of PyMutex, but we can make many changes to implementation without modifying the size of PyMutex or even the bitwise representation. For example:

You can change whether the lock spins, allows barging, performs direct handoff, or other some combination
The current proposed public API is minimal. You might want to extend it with locking functions that support timeouts, processes asynchronous signals (ctrl-c), or flags to control whether the GIL is released when blocking. These features are implemented internally and none of them depend on the size of the mutex or required changes to the bitwise states.

The size of the PyMutex struct affects the relative performance of features and implementation choices, but it doesn't overly constrain us on the implementation any more than PyThread_type_lock.

colesbury commented 1 month ago

The proposal ought to be "changes to CPython will force our users to need a mutex API that is as portable and consistent as the rest of the CPython API, so that they don't need to locate another portable library"

FWIW, I agree with this and I think it's the primary motivation. I listed the motivations in the linked CPython issue, and I think my first bullet point there is similar to what you wrote.

vstinner commented 1 month ago

@zooba: This discussion is quite long. Would you mind to try to summarize which changes do you want for the proposed API?

About reusing PyThread_allocate_lock(), IMO it's too different from PyMutex so that PyMutex deserves to be a separated API. PyThread_allocate_lock() would remain the platform-specific implementation (pthread and Windows). I'm afraid that adding an indirection in PyThread_allocate_lock() would have a high cost on PyMutex performance, whereas PyMutex is designed to be very efficient. So I prefer to have two separated APIs.

On my side, having PyMutex_STATIC_INIT (macro) and PyMutex_Init(&mutex); (function) would make me feel better. @colesbury: Do you agree with that?

I gave up on requesting a design which would fit with the stable ABI right now. We can design a stable ABI later, once the proposed API will be battle tested by users.

zooba commented 1 month ago

Would you mind to try to summarize which changes do you want for the proposed API?

I want the public PyMutex struct to have a size of sizeof(void *), and require initialize/destroy functions like PyMutex_Init(&mutex) and PyMutex_Destroy(&mutex). I also want the fact that both the address and the value of the mutex matters to be clearly specified, along with whatever constraints that implies (for languages other than C).

Allowing more space in any externally defined mutex allows us flexibility when changes are required without having to define an entirely new API. Requiring an initialization function also allows us flexibility to make changes without adding a new API.

I think we can allow static initialization to zero - if ever we need more complex initialization, then we can do it on first use. But having an explicit API to initialize before first use is a safe API design.

vstinner commented 1 month ago

Allowing more space in any externally defined mutex allows us flexibility when changes are required without having to define an entirely new API.

@colesbury: Does your PyMutex design come from WebKit? If yes, do you have an idea on how old is their implementation, and if their structure ever changed?

zooba commented 1 month ago

If you mean this lock from WebKit then yes, I believe so, but their public API is C++ and so it inherently covers my API concerns (including not telling non-C++ consumers "allocate 1 byte and that'll be enough", though only because it's near impossible to consume C++ APIs from anything but C++).

colesbury commented 1 month ago

Yes, the design comes from the WebKit lock linked to by @zooba. It was added in Aug 2015. The major change since then was in July 2016 when it became "eventually fair", which entailed going from three states (two bits in one byte) to four states (still two bits in one byte); the major implementation changes happen at a lower levels of the API.

I'm still not exactly sure how the C API WG is intended to operate. I think it'd be helpful if the WG as a whole decided what it wanted from this API.

zooba commented 1 month ago

I'm still not exactly sure how the C API WG is intended to operate.

We go for consensus, and when we give up on that we vote.

In general, when I'm pushing for future-proof APIs like this, I eventually get outvoted 😉 So if you insist on sticking to the current design, you probably just have to hold on until the other members give up and call for a vote. This discussion has gone on pretty long (and I've repeated my position 3-4 times already), so I expect that won't be long, if only because nobody wants to re-read the whole thing anymore.

encukou commented 4 weeks ago

Let's say that the struct can grow to up to sizeof(void *) in future CPython feature releases. Then non-C languages can either reserve pointer-sized space, or commit to checking/updating the size for each Python version.

AFAICS, PyMutex_Init and PyMutex_Destroy are easy enough to add now as they'd be no-ops, but it's a bit tricky to specify future-proof semantics. I assume they'd be thread-safe and idempotent, and calling Init after Destroy would be allowed? PyMutex_Destroy sounds doable for dynamically allocated mutexes, but we shouldn't require it for static ones (i.e. we'd promise future Pythons won't start showing warnings if you leave it out). Asserting that a mutex is unlocked when you deallocate it might be good for debug builds?

Docs for “both the address and the value” can be added. Some non-C languages call this “pinning” (rust, golang).

erlend-aasland commented 4 weeks ago

I'm fine with the 10 year old, battle tested one-byte design of WTF locks. The size of PyMutex is already given by PEP-703, which is approved by the SC. As I see it, the size is not up for debate.

Quoting Petr, from early in the discussion:

There's some mind-blowingly genius engineering behind the "one-byte struct and two functions" API surface. But the idea is proven; it's been used in WebKit since 2015. I don't think it'll need to change any time soon.

Consider me +1 for adding the API.

Perhaps there is a naming issue? Would PyStaticMutex be better?

gvanrossum commented 4 weeks ago

I'm fine with the 10 year old, battle tested one-byte design of WTF locks. The size of PyMutex is already given by PEP-703, which is approved by the SC. As I see it, the size is not up for debate.

Agreed.

zooba commented 4 weeks ago

The size of PyMutex is already given by PEP-703, which is approved by the SC. As I see it, the size is not up for debate.

The PyMutex shown in PEP 703 is not the one being discussed here, because nowhere does PEP 703 specify this public API.

If this API is required for users to directly use the mutex embedded in the object structure, then they need to be the same. I asked earlier whether this was the case and was told no: they are independent, but making the private API public is just convenient. So PEP 703 doesn't force us into anything here.

Let's say that the struct can grow to up to sizeof(void *) in future CPython feature releases. Then non-C languages can either reserve pointer-sized space, or commit to checking/updating the size for each Python version.

I'm okay with this.

erlend-aasland commented 3 weeks ago

If this API is required for users to directly use the mutex embedded in the object structure, then they need to be the same. I asked earlier whether this was the case and was told no: they are independent, but making the private API public is just convenient. So PEP 703 doesn't force us into anything here.

This is not how I interpret any of Sam's reponses; I won't reiterate them, as they are all easily found in this discussion, and I think each of them is a good (re)read. I'm 100% aligned with Sam's argumentation, and I think it would be a really bad idea to change the API in any way, including making it dynamic, and/or changing the size of the mutex. I'm fine with:

making it an unstable API (suggested by Petr in an earlier post)
somehow add Static to the name, to make it clear what kind of API this is
add an init macro a la the one Sam suggested: #define PyMutex_STATIC_INIT { 0 }

vstinner commented 3 weeks ago

I'm fine with adding a PyMutex_Destroy() function as a static inline function which does nothing.

zooba commented 3 weeks ago

making it an unstable API (suggested by Petr in an earlier post)

somehow add Static to the name, to make it clear what kind of API this is

add an init macro a la the one Sam suggested: #define PyMutex_STATIC_INIT { 0 }

I don't think any of these are necessary. I have no problem committing to support the API over multiple releases, provided it gets a bit of buffer space, adding "Static" to names doesn't make anything clearer in my opinion, and saying "initialize to zero" is sufficiently future-proof (provided we can detect an uninitialised mutex on first use, which it sounds like we can, because our internal hash set won't have an entry).

a PyMutex_Destroy() function as a static inline function which does nothing

I think there's a valuable use for a Destroy function to release and fail anyone still waiting on the mutex, which also helps ensure you are safely destroying the mutex (or alternatively, fail to destroy if there are any waiters, which is probably almost as good). But I'm more concerned that we have the API available so it can be implemented without breaking existing users than that it does anything helpful on first release.

vstinner commented 3 weeks ago

If I understood correctly, the proposed API would look like:

typedef struct PyMutex { ... } PyMutex;
#define PyMutex_STATIC_INIT ...
static inline void PyMutex_Init(PyMutex *m) { ... }
static inline void PyMutex_Lock(PyMutex *m) { ... }
static inline void PyMutex_Unlock(PyMutex *m) { ... }
static inline void PyMutex_Destroy(PyMutex *m) { ... }

Did I miss anything? I'm fine with this API. I don't worry much about the ... ("implementation details").

capi-workgroup / decisions

Make `PyMutex` public in the non-limited API #22

Links

Overview