capi-workgroup / decisions

Discussion and voting on specific issues
4 stars 1 forks source link

gh-110850: Change PyTime C API error handling #15

Closed vstinner closed 2 months ago

vstinner commented 3 months ago

Python 3.13 alpha 4 added a new PyTime API was added for Cython:

Problem: added API requires to hold the GIL, whereas Cython needs an API which doesn't hold the GIL: read @scoder's comment https://github.com/capi-workgroup/decisions/issues/8#issuecomment-1959054377.

I propose different options:

In terms of API:

Right now, my PR https://github.com/python/cpython/pull/115973 implements option (B): check 3 clocks at startup.

I prefer option (A): it logs errors, and it provides a convenient API which doesn't require the caller to hold the GIL and doesn't require the caller to check for errors.

Please vote for your preferred option, or add a comment if you want a 4th option.

vstinner commented 3 months ago

or add a comment if you want a 4th option

@scoder proposed a different API: int PyTime_Time(PyTime_t *result)

My concern is that it's uneasy to translate arbitrary error to an errno code, especially on Windows. That's why I propose to log an unraisable exception.

See his full comment: https://github.com/capi-workgroup/decisions/issues/8#issuecomment-1959054377

zooba commented 3 months ago

Why not int PyTime_Time(PyTime_t *result) but it returns the errno?

I don't like acquiring the GIL inside non-GIL public API, almost as much as I don't like causing a fatal error.

I'd honestly prefer to have another API int PyTime_IsSupported() and if that returns 0 then all the other time functions just return randomly generated numbers. But since that's a bit too hostile, returning a somewhat meaningful error code directly seems like an okay approach.

(I'm assuming there are suitable errno values for this. If not, make up some a new return enum.)

encukou commented 3 months ago

AFAIK, Cython currently uses only the _PyTime_GetSystemClock and _PyTime_AsSecondsDouble. it doesn't expose Python's monotonic or perf clocks. And arguments against having to hold the GIL tend to center on performance -- use cases where one would presumably want the monotonic/perf clocks. I'm still not convinced that we have an actual use case for this: it looks more like adding API to enable a new use case.

FWIW, if it's just for Cython, we could also bring back the underscore-prefixed _PyTime_GetSystemClock/_PyTime_AsSecondsDouble -- at least until CPython stops needing them.


If we add this, my preferred alternative would be: PyTime_t PyTime_TimeUnchecked(void), with -1 on error and clamping on overflow. (A successful -1 result, if it can be detected, would be returned as 0, losing a nanosecond of accuracy.)

Why not errno? This is explicitly a cross-platform wrapper; let's not use a platform-specific error scheme. We already have exceptions for error proper handling. AFAIK there are two common kinds of error to distinguish: overflow, and “unspecified error”. Anything more granular calls for full-blown exceptions.

Why no startup checks? WASI and tkinter show that exposing a subset of Python's functionality is useful, on platforms that don't have e.g. sockets or GUI screens. I'd like to keep clocks optional as possible, too, with runtime errors if they happen to be unavailable.

IMO, functions that don't require GIL should also be usable when the runtime isn't initialized. Especially wen the GIL might go away, questions like “is it safe to use Python objects? memory allocator?” are unclear; having a single list of exceptional functions that don't require GIL/runtime would make things clearer.

vstinner commented 3 months ago

FWIW, if it's just for Cython, we could also bring back the underscore-prefixed

Does the Working Group now promote the usage of private APIs? I don't get it.

scoder commented 3 months ago

arguments against having to hold the GIL tend to center on performance -- use cases where one would presumably want the monotonic/perf clocks.

It's not entirely about performance, more of a "works in parallel or not" kind of argument. Having to acquire the GIL just to get the time in a cross-platform way seems … excessive.

It's mostly a practical argument – if it's worth using and doesn't need the GIL, why require the GIL?

And yes, it would be cool to also have equivalents of time.monotonic_ns() and time.perf_counter_ns() callable without the GIL. I'm all up for that.

encukou commented 3 months ago

I started a summary of the proposals and their pros/cons here: https://hackmd.io/lpuVQBpiTXy2_xfcvd9ADQ Please add to it -- and remove yourself from any proposal that you think is worse than another one.

vstinner commented 3 months ago

I started a summary of the proposals and their pros/cons here: https://hackmd.io/lpuVQBpiTXy2_xfcvd9ADQ

Ok, so there are 3 options.

Option A: don't add new functions, require the GIL

Don't add any new function!

Only expose an C API which maps directly to the Python API: only expose the existing int PyTime_Time(PyTime_t *result) API which requires the caller to hold the GIL and sets an exception on error.

Pros

Cons

Option B: just return -1 on error, don't require the GIL, no exception nor errno

API: int PyTime_TimeUnchecked(PyTime_t *result)

Add also int PyTime_MonotonicUnchecked(PyTime_t *result) and int PyTime_PerfCounterUnchecked(PyTime_t *result).

If the caller wants an exception, PyTime_Time() can be called. Example logging the exception as an unraisable exception (call sys.unraisablehook), and return time=0 on error:

// Function called without holding the GIL
PyTime_t get_time(void)
{
    PyTime_t t;
    if (PyTime_TimeUnchecked(&t) < 0) {
        // Get the GIL to call PyTime_Time()
        PyGILState_STATE state = PyGILState_Ensure();
        if (PyTime_Time(&t) < 0) {
            // Handle the exception:
            // log an unraisable exception.
            PyErr_FormatUnraisable("failed to get time");
            t = 0;
        }
        PyGILState_Release(gstate);
    }
    return t;
}

Note: existing int PyTime_Time(PyTime_t *result) API API stays, the int PyTime_TimeUnchecked(PyTime_t *result) variant is added.

Pros

Cons

Option C: same as option B, but set errno

API: int PyTime_TimeUnchecked(PyTime_t *result)

Examples of errno:

Pros

Cons

Naming

If someone has a better name than PyTime_TimeUnchecked(), please propose a better name.

vstinner commented 3 months ago

I prefer the Option B ("just return -1 on error, don't require the GIL, no exception nor errno") for code which cannot use the GIL or doesn't care about errors (yeah, it does exist, see below!).

If I know that at some point I want an exception with a useful type and a good error message, I prefer calling PyTime_Time() (and hold the GIL). I dislike errno which is a lower quality of the exception: the generated error message (os.strerror(errno)) is generic and provides less information than PyTime_Time() exception.

Example with a benchmark:

// The caller doesn't have to hold the GIL
PyTime_t benchmark(void (*func)(void))
{
    PyTime_t t1, t2;
    (void)PyTime_PerfCounterUnchecked(&t1);
    func()
    (void)PyTime_PerfCounterUnchecked(&t2);
    return t2 - t1;
}

It's ok if the benchmark returns 0, it doesn't crash, and the user will likely want to investigation to see what's going on.

Sometimes, the purpose of the function is not a benchmark, measuring the time spent is just to write debug logs (see below). Having to bother with error on reading time is not convenient in this case.

It's more convenient than having to check for errors (and hold the GIL):

// The caller must hold the GIL
PyTime_t benchmark(void (*func)(void))
{
    PyTime_t t1, t2;
    if (PyTime_PerfCounter(&t1) < 0) {
        PyErr_Clear();
        t1 = 0;
    }
    func()
    if (PyTime_PerfCounter(&t2) < 0) {
        PyErr_Clear();
        t2 = 0;
    }
    return t2 - t1;
}

Example of existing Python functions calling the existing internal "Unchecked" functions:

zooba commented 3 months ago

I'm still not convinced that we ought to have the unchecked versions in our public API at all, so I prefer A.

We do have a growing number of "platform adaptation layer" functions (e.g. much of fileutils.c, and the atomics and threading APIs) that would certainly be useful. So there might be some value in splitting those out to their own library which we then consume, but also other authors can use the same library if they want platform-independence without coding it themselves. No need to make them stick to a version of Python just to get generic native helper functions.

If everyone else thinks there's a strong enough need, then I prefer B over C.

vstinner commented 3 months ago

@encukou @gvanrossum @iritkatriel: What's your call on this decision?

encukou commented 3 months ago

For clarity, by ”B” and “C” I mean the later ones, not the “(B)” and “(C)” from the original post.

I think ”C” is out. The API to add is B (int PyTime_*(PyTime_t *result) returning -1 on error), and the question is whether to add it at all.

I added No clear existing use case as a “con”, but maybe it's an issue to resolve. It's blocking me from saying yes here.

So let me elaborate: I haven't seen an existing use case solved by this API. The closest are:

If the only reason to add this is not breaking Cython, we can bring back the old _Py function that was removed. Then Cython doesn't need to change, and we don't need to start supporting new API that doesn't have a known existing use case. (And I would not call that “promoting the usage of private APIs” -- rather it's keeping stuff that wasn't meant to be exposed, in order to not break existing software.)

gvanrossum commented 3 months ago

I am for B.

scoder commented 2 months ago

I think all three timers should have simple error handling variants (B).

Steve's idea of a platform abstraction library also sounds worth thinking about.

encukou commented 2 months ago

OK, some people prefer adding the API, none. And “B” is the API to add. To recap:

int PyTime_<clock><suffix>(PyTime_t *result), for <clock> in Time, Monotonic, PerfCounter. The suffix is still up for some bikeshedding.

Let's vote: are you OK with adding this? (To avoid pings, I checked boxes for people who voted by comment):

As for the suffix, are you OK with Raw (e.g. PyTime_TimeRaw())?

Other options that someone prefers on the HackMD are NoGIL and NoException.

vstinner commented 2 months ago

An alternative is to change the "need GIL" API: rename PyTime_Time() to:

And use PyTime_Time() name for the API which doesn't need the GIL.

scoder commented 2 months ago

… And use PyTime_Time() name for the API which doesn't need the GIL.

That seems unexpected, given that C-API calls normally require the GIL. It should be explicitly clear if they don't.

I think both NoGil and NoException suggest that it's only one specialty that applies but not both. Raw sounds more like it's close to the system without much Python things on top, which it is.

encukou commented 2 months ago

I don't see the new API as “closer to the OS” or “Python stuff chopped off”, but as “we'll do what we can without a GIL”. On a new platform where getting the time is complicated and needs a GIL for some reason, I think it'd be fine if this API returns -1 but the regular versions succeed.

But it's a small enough detail; Raw is fine.

zooba commented 2 months ago

Let's vote ...

If nobody is going to bother to even hint at why it's more important for this to be a CPython API than in its own library, I have no reason to vote in support. So far, this just feels like applying political/social pressure rather than arguing an API design (and I'm stubborn enough to resist that).

As for the suffix, are you OK with Raw

Assuming we can't possibly convince Petr that this is closer to the OS because it has the Python stuff chopped off, I'll settle for Raw. But it seems pretty obviously closer to the OS, so I'd love to know what Petr thinks an actual platform-abstraction function would look like, if not a lightweight wrapper around the OS functionality.

scoder commented 2 months ago

There are also the PyOS_* series of functions. Would the new ones fit in there?

vstinner commented 2 months ago

On Linux, clock_gettime() has a CLOCK_MONOTONIC_RAW clock:

   CLOCK_MONOTONIC_RAW (since Linux 2.6.28; Linux-specific)
          Similar to CLOCK_MONOTONIC, but provides access to a  raw  hard‐
          ware-based  time  that  is not subject to NTP adjustments or the
          incremental adjustments performed  by  adjtime(3).   This  clock
          does not count time that the system is suspended.

The Raw suffix gives the function name PyTime_MonotonicRaw() which might be misleading.


FYI the current implementation has an int raise parameter. For example, the private _PyTime_TimeUnchecked() uses raise=0, whereas the public PyTime_Time() uses raise=1. Here "raise" means "raise an exception". But I prefer to say that a C function sets an exception, rather than "raising" an exception.

That's my rationale to prefer NoException suffix. Or NoExc if you prefer a shorter name (ex: PyTime_TimeNoExc()).

I would also be fine with NoGIL.

encukou commented 2 months ago

a CPython API than in its own library

I'm not sure about the scope of such a library. Functions to get the time, but with crude error handling since we can't use exceptions? Platform wrappers, but only ones that CPython needs (or needed historically)?

I'd love to know what Petr thinks an actual platform-abstraction function would look like, if not a lightweight wrapper around the OS functionality.

To me, that would be the API we have in main now, which includes Python's take on error handling. If we need to provide GIL-less, exception-less API, why not. But it won't be a lower-level library that CPython can build on, unless it provides a rich enough, extensible interface for error-handling. Designing that is unnecessary in this case. And I don't think we'd manage to avoid tightly coupling it to one provider (Unix errno) and one consumer (Python exceptions).

zooba commented 2 months ago

I'm not sure about the scope of such a library.

It's up to whoever builds it. Clearly it wouldn't necessarily define itself in terms of CPython, because CPython doesn't need the APIs that would go in it (or we'd add them to CPython). So "platform wrappers helpful for CPython extension authors" perhaps?

Alternatively, refactor this proposal so that we do actually use the functions ourselves and I've got far less concern about adding them, and even making them public. It's the lack of need and thus the lack of use that is my concern. But if they're a sensible refactoring on our existing functions, then sure.

To me, that would be the API we have in main now, which includes Python's take on error handling.

Which is essentially looking at errno, even on Windows. The error mechanism isn't ours to extend here. We either pass through what the system provides, or we narrow it down to something specific/predictable. Either way, it's no real loss of generality to get the function from a vendored source library (which other libraries can also vendor if they want the GIL-less versions of these functions).

encukou commented 2 months ago

refactor this proposal so that we do actually use the functions ourselves

We do use GIL-less time functions for thread/socket/select timeouts. (Interestingly, all are unavailable on WASI -- the supported platform with rather unreliable clocks.)

The functions we currently use return 0 on failure. (This makes time appear to stop -- not great for timeouts.) If we use the proposed variants, we can report proper errors on exotic platforms and configurations.

vstinner commented 2 months ago

Since the majority is for Raw suffix, let's use this one.

encukou commented 2 months ago

@zooba, could you list which of your concerns aren't explored? I believe I've considered everything you've said so far, but I don't know what you find important and unsolved.

There are big differences in how we view the GIL-less time API, but practically they only affect details like what the best name would be.

Whenever you think it's time to move on, could you put “abstain” or “-0” in the voting post? (If you do that now we'd have 3 “for” and 2 “abstain”, so the proposal should be reluctantly approved.)


FWIW, I'm concerned about not being able to support this API on a new, exotic platform. (Is WASI exotic? WASI Clocks are in the pre-standardization “Phase 3”, but looks like they'll be a granular, optional feature. Python might require clocks, but it'd be best to require as little as possible.)

Anyway, I consider this concern solved: the new GIL-less functions can simply return -1 on the new platform even if the normal variants would succeed. (Sure, some user libraries probably won't be prepared for a -1, but that's par for the course for new platforms.) This option is not explicit in the proposal, but it is a rather obvious decision to make when porting to a new platform. Another possible decision would be to leave this API out on that platform.

vstinner commented 2 months ago

@zooba:

If nobody is going to bother to even hint at why it's more important for this to be a CPython API than in its own library, I have no reason to vote in support.

PyTime_Time() was added for Cython. Problem: Cython doesn't want to use it, since it requires the GIL, and asked to add a new function which doesn't require the GIL.

Also, the idea is to make sure that Cython reduces its consumption of the Python private C API and internal C API since Python doesn't provide any backward compatibility warranties on them. Cython is broken by each Python 3.x.0 release and it takes between weeks and months (since Python 3.x alpha1 release) to fully update Cython to the new Python version. The goal here is not fully avoid these compatibility issues, but only reduce them one by one. See issue gh-89410 "Add explicit support for Cython to the C API".

Can you please elaborate why do you disagree with the motivation? Why do you prefer that Cython continues using private/internal C API? Do you consider that it's not Python responsibility to reduce Cython friction with the C API?

zooba commented 2 months ago

PyTime_Time() was added for Cython. Problem: Cython doesn't want to use it, since it requires the GIL, and asked to add a new function which doesn't require the GIL.

Thank you.

Can you please elaborate why do you disagree with the motivation? Why do you prefer that Cython continues using private/internal C API?

Because there's a huge difference in our responsibilities regarding the reliability of a public API (particularly a limited API) and a private or internal one. I take that seriously, and prefer to not increase the size of that responsibility lightly.

I disagree with taking responsibility for a function that is not required for normal CPython operation, and is not used in normal CPython operation (and so is not validated by normal CPython operation), only for the sake of a single other project. A new public API isn't a gift - it's a puppy.

I would be okay with this if we make it a PyUnstable_ API, and we explicitly document that it may return an error for "not yet supported on this platform", so that we aren't obliged to withhold CPython from a new platform if we can't easily make it work (as Petr suggests). If Cython are happy with those conditions, then I am too.


As an aside, while I find "some 3rd party project uses it" to be an incredibly weak justification, "some 3rd party project uses it for something that is otherwise impossible for them to do" is stronger. Simply saying "Cython uses it" is basically never going to convince me of anything - I suggest developing that kind of rationale with some more details about what they're actually using it for.

vstinner commented 2 months ago

I disagree with taking responsibility for a function that is not required for normal CPython operation

The proposed API already exists in the internal C API and is being used in many places, including important function such as threading.Lock.acquire(timeout), gc.collect(), import, etc. These functions are currently called Unchecked since the current implementation return 0 on error and don't report the failure to the caller.

call_timer():
    Modules/_lsprof.c:        return _PyTime_PerfCounterUnchecked();

gc_collect_main():
    Python/gc_free_threading.c:        t1 = _PyTime_PerfCounterUnchecked();
    Python/gc_free_threading.c:        double d = PyTime_AsSecondsDouble(_PyTime_PerfCounterUnchecked() - t1);

import_find_and_load():
    Python/import.c:        t1 = _PyTime_PerfCounterUnchecked();
    Python/import.c:        PyTime_t cum = _PyTime_PerfCounterUnchecked() - t1;

_PyMutex_LockTimed()
    Python/lock.c:    PyTime_t now = _PyTime_MonotonicUnchecked();
    Python/lock.c:        PyTime_t now = _PyTime_MonotonicUnchecked();

_PySemaphore_PlatformWait():
    Python/parking_lot.c:        PyTime_t deadline = _PyTime_Add(_PyTime_MonotonicUnchecked(), timeout);
    Python/parking_lot.c:        PyTime_t deadline = _PyTime_Add(_PyTime_TimeUnchecked(), timeout);
    Python/parking_lot.c:            PyTime_t deadline = _PyTime_Add(_PyTime_TimeUnchecked(), timeout);

EnterNonRecursiveMutex():
    Python/thread_nt.h:        PyTime_t deadline = _PyTime_Add(_PyTime_PerfCounterUnchecked(), nanoseconds);
    Python/thread_nt.h:            nanoseconds = deadline - _PyTime_PerfCounterUnchecked();

_PyThread_cond_after():
    Python/thread_pthread.h:        t = _PyTime_MonotonicUnchecked();
    Python/thread_pthread.h:        t = _PyTime_TimeUnchecked();

PyThread_acquire_lock_timed():
    Python/thread_pthread.h:        PyTime_t deadline = _PyTime_Add(_PyTime_MonotonicUnchecked(), timeout);
    Python/thread_pthread.h:            PyTime_t abs_time = _PyTime_Add(_PyTime_TimeUnchecked(),

EDIT: Oh, I already gave this list in a previous comment.

zooba commented 2 months ago

Sure, but we're allowed to add functions for ourselves. You don't need to justify the function existing, you need to justify it being available and stable for external users. For example, if that private function was already public and stable, we wouldn't be able to change the error result from "0" to "-1" without a deprecation period (or ever, if it was stable ABI).

Does Cython need to get the time in order to interact with any of our "unchecked" uses of it? Or do they just want it for their own purposes?

encukou commented 2 months ago

If Cython needs a GIL-less function, then I assume it needs to match the Python implementation: specifically, Monotonic/PerfCounter need to use the same epoch in order to be comparable.

Using the same implementation is probably the only reasonable way to ensure that :)

zooba commented 2 months ago

If Cython needs a GIL-less function, then I assume it needs to match the Python implementation

That's a favourable assumption. You could equally assume that they don't want to manage platform-specific code in their own project, based on the information I've heard so far.

If they do, in fact, need to match the Python implementation, it should be easy to explain why. We don't have to assume something like that. (And if it can be explained, the whole feature can probably be justified in terms of enabling whatever scenario that is, which bypasses my concern about us having to own public API solely for a single user.)

encukou commented 2 months ago

So, we're back to “No clear existing use case” as the reason to not add these.

scoder commented 2 months ago

If Cython needs a GIL-less function, then I assume it needs to match the Python implementation: specifically, Monotonic/PerfCounter need to use the same epoch in order to be comparable.

Yes, that's the idea. It's to be used in a time-module-like API, as mentioned here: https://github.com/capi-workgroup/decisions/issues/8#issuecomment-1916529840

While I'd accept copying CPython code into that module in case of disapproval, I'd rather have CPython expose this functionality.

zooba commented 2 months ago

I think (as a fairly heavy Cython user) copying the code for this kind of use is just fine. Never in my wildest dreams would I expect to call something from a cpython module without the GIL (is it documented somewhere as nogil?), and I don't even know why I'd want to do it with a CPython-equivalent value rather than something native to the platform.

I know there's no loss to calling a nogil function with the GIL held, but if it's not being done deliberately, there's no gain either.

This really does seem like an application where it's best for Cython to manage their own compatibility, and probably to expose the native API via a cython.time module. We are concerned about surprising Cython devs/users with compatibility changes in the future, and I'm sure you guys don't want that either, but that's what would happen if users start trying to work with a platform that Cython "supports" but our APIs return nonsense. Better for Cython to be able to say "our nogil API isn't supported on this platform yet, use the yesgil API if you need to support it" than to have to say "hopefully upstream will sort it out, it's out of our hands".

encukou commented 2 months ago

Traditionally, CPython exposed the helpers it uses. Here, we have a way to expose them cleanly without significant maintenance overhead. For me, the question is whether Cython or its users really need this. If they do, IMO CPython is the best place where this API should live. (Maintaining, and including, a dedicated C library is too much overhead.)


Well, I guess we're blocked on a general disagreement about what the API should be, or rather how much we're willing to compromise for the benefit of existing users.

What's worse, we can't simply keep the status quo, because that would mean deciding whether status quo to keep is 3.12 (_PyTime_GetSystemClock) or current main (GIL functions only).

zooba commented 2 months ago

Traditionally, CPython exposed the helpers it uses

But why? What was the intent? Was it just to use them from our own extension modules (and hence they get a leading underscore)? If so, that's not a tradition that implies we need a more maintainable way to do it.

What's worse, we can't simply keep the status quo, because that would mean deciding whether status quo to keep is 3.12 (_PyTime_GetSystemClock) or current main (GIL functions only).

Sure we can. Private functions can come and go as we please, unless they're in the stable ABI. Presumably this one isn't if it's already been removed, so status quo is to continue to leave it removed.

encukou commented 2 months ago

But why? What was the intent?

Good question, made me think.

I can only guess at the intent, but I'm pretty sure that in general, exposing the internals (with a “use at your own risk” warning as appropriate) made it much easier for users to come up with a better array library, a cached_property, a better async library, greenlets, JITs, or tight integration with various libraries -- some of which have been absorbed into CPython at its own pace.

Private functions can come and go as we please, as the internals change, but only recently has Python began purposefully hiding the internals for no reason other than that they're internal. That is a worrying trend.

Well, you've convinced me to change my vote to +1. This is supportable API that Python needs (and will continue to need) for itself. I hope that someone uses this to make a better mutex, flow control mechanism or profiler, and a ready-to-use cross-platform timer saves them a few hours in the prototyping phase.

zooba commented 2 months ago

only recently has Python began purposefully hiding the internals for no reason other than that they're internal. That is a worrying trend.

In response to a worrying trend of third-parties being surprised when we change internals (culminating in the tp_print incident, as I recall). Which trend is more concerning: third-parties not getting to poke around our insides, or third-parties not being broken every update?

Well, you've convinced me to change my vote to +1.

Which obviously wasn't my intent 😆 But since you came up with a different rationale for everything being public by default than I had, it's not a surprise. I guess we'll find out in the future which was the more sensible position.

(FTR, I already considered myself outvoted on this. If you need an official "I abstain" to be comfortable going ahead with the majority, then I abstain.)

vstinner commented 2 months ago

The working group approved the API described in https://github.com/capi-workgroup/decisions/issues/15#issuecomment-2031344536. I close the API and I wrote a PR to implement it: https://github.com/python/cpython/pull/118394