Make `PyMutex` public in the non-limited API

colesbury commented 2 months ago

Links

CPython Issue: https://github.com/python/cpython/issues/117511 PR draft: https://github.com/python/cpython/pull/117731 Additional discussion: https://discuss.python.org/t/make-some-pymutex-apis-public/50203

Overview

The PyMutex APIs are currently internal-only in pycore_lock.h. This proposes making the type and two functions public as part of the general, non-limited API in Include/cpython/lock.h.

The APIs to be included in the public header are:

typedef struct PyMutex { ... } PyMutex;
static inline void PyMutex_Lock(PyMutex *m) { ... }
static inline void PyMutex_Unlock(PyMutex *m) { ... }

Additionally, the out-of line of "slow path" functions are exposed in the header, but not part of the public API. (For context, these are called when PyMutex_Lock needs to block and when PyMutex_Unlock needs to wake up another waiting thread).

// (private) slow path for locking the mutex
PyAPI_FUNC(void) _PyMutex_LockSlow(PyMutex *m);

// (private) slow path for unlocking the mutex
PyAPI_FUNC(void) _PyMutex_UnlockSlow(PyMutex *m);

colesbury commented 3 weeks ago

If you want to let users of the API assert that no one is waiting on the mutex, we should make int PyMutex_IsLocked(PyMutex *m) public. We use this in a number of places internally. The choice of when to check (debug vs. release) should be up to the user of the API.

I think that PyMutex_Destroy is not a good addition:

It doesn't do what the name suggests -- there is nothing to destroy and the primary suggested purpose is something else (asserting that there are no waiters).
It does not let you implement a non-trivial destroy in the future without breaking users -- you would almost certainly break existing users in weird edge cases, like after fork.

zooba commented 3 weeks ago

It does not let you implement a non-trivial destroy in the future without breaking users

Yeah, I realised this as well.

IsLocked is generally considered a bad API for synchronisation primitives, isn't it? I'm sure I recall hearing that somewhere, though I can't be sure now. I guess what we ought to do is assert that a mutex is held when releasing it (even if that "release" is practically just deallocating the memory around it).

I'm sure we get that right for our internal uses, but it's far more likely that users will use this in different ways and likely get it wrong. A Destroy that requires the mutex be held and makes it invalid for future use is definitely going to help users figure out their bugs.

the proposed API would look like

I'd like to see error results as well, not just void. We haven't settled on our guidelines yet, but one of the ones we all liked was that we should start with error codes on APIs, even if they "can never fail". They can all be return 0, and if they're inlined then there'll be no cost at runtime, but it also reserves options.

vstinner commented 3 weeks ago

we all liked was that we should start with error codes on APIs, even if they "can never fail"

I dislike adding an error code path when it's possible to avoid it. I prefer to return void if the function "cannot fail".

I'm not sure which errors you're referring to. Which functions can/should be able to fail?

colesbury commented 3 weeks ago

IsLocked is generally considered a bad API for synchronisation primitives, isn't it?

It's not useful for making control flow decisions, but it's useful for assertions. Both "assert that the mutex is not locked" and "assert that this mutex is already locked" (i.e., by a caller higher up the stack), which is particularly helpful for catching bugs when adding locking to make existing code thread-safe.

I'd like to see error results as well, not just void.

Adding error codes complicates the use of the API to the point where callers are best served by just ignoring them. How are you supposed to handle a failure of PyMutex_Unlock()? There are many contexts in which PyMutex_Lock() failures cannot be handled either, like if you need to lock a mutex in some clean-up path.

The only possible failures are due to incorrect usage, like unlocking a mutex that is not locked, but returning an error code in that case isn't a good design because they are likely to be ignored.

if they're inlined then there'll be no cost at runtime...

There's still a runtime cost if callers feel the need to write something like: https://github.com/python/cpython/blob/153b118b78588209850cc2a4cbc977f193a3ab6e/Python/ceval_gil.c#L122-L124

zooba commented 3 weeks ago

I prefer to return void if the function "cannot fail".

Yes, we had that discussion. I guess in our next in-person we'll argue this one out some more, but I'm very much with @encukou on this. APIs that can have error results, should have error results.

There are many contexts in which PyMutex_Lock() failures cannot be handled either, like if you need to lock a mutex in some clean-up path.

The fact that some situations cannot be handled doesn't invalidate all the ones where they can be handled. A public API is going to be used in a lot of ways we can't predict - we should leave it up to users to know whether they can or can't handle it in whatever context they're in. All we can really predict is that people will indeed use it incorrectly, and so failures are possible, and it will help people if we report them.

Also, handling doesn't necessarily mean recovering. It might mean asserting with some application-specific assert functionality. How do you intend for users to find bugs in their PyMutex_Unlock calls when there's no feedback available?

There's still a runtime cost if callers feel the need to write something like:

Any optimiser that can't figure out that a literal return 0 means that code will never run and simply remove it isn't going to optimise well enough to worry about runtime cost. If we can't use a literal return 0, then it means an error is actually possible and the code is not wasted. We can't stop people ignoring the error code, but if they decide that it's worth it to ignore the error and assume it'll always succeed, at least we gave them the option. If we decide that for them, we're taking away options.

colesbury commented 3 weeks ago

How do you intend for users to find bugs in their PyMutex_Unlock calls when there's no feedback available?

There is feedback -- it fails with a PyFatal_Error() because it's clearly an error in usage that's not recoverable: https://github.com/python/cpython/blob/4e8aa32245e2d72bf558b711ccdbcee594347615/Python/lock.c#L189

Any optimiser that can't figure out that a literal return 0

That only works if the slow path functions return void. If they return an error code, even if it's always zero, than the condition can't be optimized away.

colesbury commented 3 weeks ago

we should leave it up to users to know whether they can or can't handle it in whatever context they're in...

Users cannot handle errors if you do not describe the error conditions. What are the error conditions for PyMutex_Lock() or PyMutex_Unlock()?

We should not try to make the APIs that are as general as possible. We should aim for APIs that are easy to use and hard to misuse. Vaguely described error conditions that never happen in practice make the API harder to use correctly.

zooba commented 3 weeks ago

That only works if the slow path functions return void.

If there's genuinely no errors possible (as claimed for the current implementation), then these can continue to return void. My concern is about what we tell consumers to write, so that if/when the implementations change, their code still works without them having to rewrite it (but possibly they have to recompile).

What are the error conditions for PyMutex_Lock() or PyMutex_Unlock()?

"Not a mutex" or "not locked", though I appreciate that the first is not distinguishable with this implementation. But errors are not difficult to identify.

We should not try to make the APIs that are as general as possible. We should aim for APIs that are easy to use and hard to misuse.

Clearly we just disagree on how to interpret this. There's a lot of generality I'm not asking for, and what I am asking is not expanding the scope of the APIs at all. I just want to ensure that users who use our public APIs won't feel screwed over if we later change the implementation. We've jumped through so many hoops to avoid changing APIs like Py_INCREF and PyMem_Malloc while changing their implementations that I don't think it's unreasonable to predict that one day we might need to make changes to this API as well.

erlend-aasland commented 3 weeks ago

We've jumped through so many hoops to avoid changing APIs like Py_INCREF and PyMem_Malloc while changing their implementations that I don't think it's unreasonable to predict that one day we might need to make changes to this API as well.

We're talking about a dead simple static one-byte mutex API; comparing it with something as complex as Py_INCREF, which is tightly coupled to intricate parts of the VM including the GC, is comparing apples with a beef stroganoff. API design is not about applying a blanket of rigid rules; API design is weighing a lot of different factors. For this field proven API, I am totally fine with discarding the low probability that any static mutex API will ever need to change. This implies I'm fine with no error reporting; This is a good example of an infallible API. The fatal error by API misuse is fine.

All participants in this thread, including Steve, has acknowledged there is a vanishing probability that this API will ever need to change. Why weigh it heavier than the rest of the API design factors?

To quote the infamous zen, just for fun: practicality beats purity, simple is better than complex.

vstinner commented 3 weeks ago

Before the C API Working Group was created, many APIs were added without any design for many years. Some were more lucky and tried to avoid some issues and were reviewed by a few people. At the end, the C API exists, has a few flaws but it's not the end of the world.

Now I see the C API Working Group as a gatekeeper preventing other people from changing the C API. It's good to have more eyes on changes, to prevent known issues. But there it seems like the discussion goes nowhere and Sam is just blocked by the Working Group. I don't think that the discussion is still constructive but I don't know how to move on.

We are not going to solve all issues suddenly at once. We can try to avoid the most issues pitfals. But here I feel like we are exploring an hypothetical future of an API which doesn't already exist.

Let's allow us to make mistakes.

If the API looks fragile, we can use the "provisional" statuscin the doc. The API targets free threading which is already experimental, no? Early adopters would not be surprised if the API changes.

erlend-aasland commented 3 weeks ago

Thanks Victor. I suggest we go for the unstable API tier and Sam's design. That is probably the best way to move forward in order to please everyone.

gvanrossum commented 3 weeks ago

We could add this to the agenda for our upcoming in-person meeting (this Wednesday). This kind of thing that goes around in circles is often easily resolved in person.

On Mon, Jun 3, 2024 at 2:01 PM Erlend E. Aasland @.***> wrote:

Thanks Victor. I suggest we go for the unstable API tier and Sam's design. That is probably the best way to move forward in order to please everyone.

— Reply to this email directly, view it on GitHub https://github.com/capi-workgroup/decisions/issues/22#issuecomment-2146114854 or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWCWMRH5DLBPZEAKL45MELZFTKSPBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVE3TENZWHEZDENRQQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRSGIZTMNBSGY4TSOFHORZGSZ3HMVZKMY3SMVQXIZI . You are receiving this email because you commented on the thread.

Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .

-- --Guido van Rossum (python.org/~guido)

encukou commented 3 weeks ago

API design is not about applying a blanket of rigid rules; API design is weighing a lot of different factors.

Yup. The rules are very useful for everyday API design, but each comes with reasoning and trade-offs we need to consider if we want to break them.

A minimal PyMutex struct with public size means if that if we ever do need more space, we'll need a whole new API.
No error reporting means that we can't deprecate this API with a DeprecationWarning (which -Werror would turns into an exception)

Both are a problem for the stable ABI, which means that if we add PyMutex as proposed here, we might need to come up with something different for the stable ABI later. Ideally, we'd like to only add API that we can promote to stable ABI as-is. But I can't find a good way do that here.

We're talking about a dead simple static one-byte mutex API; comparing it with something as complex as Py_INCREF, which is tightly coupled to intricate parts of the VM including the GC, is comparing apples with a beef stroganoff.

IMO, PyMutex, with its atomic parking lot hashtable hiding behind a two-bit struct and two infallible functions, is actually very similar to Py_INCREF in this respect. And it deserves similar treatment.

All participants in this thread, including Steve, has acknowledged there is a vanishing probability that this API will ever need to change.

I don't share that optimism; I think we will want to change or remove it in the future. When that happens, it'll be messy for us and mean extra work for users. But, my gut feeling is that we'll run into unknown issues, ones that we can't solve now with the usual future-proofing techniques. So, I still think we should add the API mostly as proposed, with minor tweaks. Out of the future-proofing suggestions in this thread, the ones I think would help are:

adding a Destroy function to be called before freeing a mutex (for mutexes that are dynamically allocated -- for static ones there'd be no clean-up)
allowing the struct to grow -- but IMO that's only a concern for the stable ABI so it can be done later (specifically, non-C languages that use non-limited API have this issue solved)

Sam, sorry that you got caught in the cross-fire. (There's a reason Steering Council meetings are private...)

Steve, thanks for brainstorming; your ideas are valuable even if (if!) they don't make it.

encukou commented 3 weeks ago

The C API Working Group met to discuss this proposal and has decided to accept the API as shown in the original post with the following clarifications:

The size of a PyMutex struct should be documented as unstable, meaning that while it is currently 1 byte, the size may be changed in future major releases without a deprecation period.
The m argument should be documented as requiring a fixed address that points to writable memory. This is intentionally more direct wording than in the current proposal.

There is no need to artificially increase the size of PyMutex, to provide initialization or destruction functions or macros, or error return values. These should only be minor additions to the documentation as it stands in PR #117731.

We would like to see the inline functions also exported as regular functions, in line with our desire to see the API be usable without relying on macros. This would be required for Limited API inclusion, but is nice to have regardless. In our view, exported functions do not constrain future API changes any more than inline functions.

A fresh review by the WG will be required before these may be added to the Limited API.

— Petr, on befalf of the WG

gvanrossum commented 3 weeks ago

(Clarification: by "major releases" we mean "feature releases", IOW the size might change in 3.15.)

colesbury commented 1 week ago

I will work on the PR for PyMutex this week (hopefully today).

vstinner commented 1 week ago

The C API Working Group met to discuss this proposal and has decided to accept the API as shown in the original post with the following clarifications: (...)

@colesbury is now working on updating his PR. IMO we can now close the issue.

capi-workgroup / decisions

Make `PyMutex` public in the non-limited API #22

Links

Overview