Closed vstinner closed 8 months ago
@gvanrossum:
I don't have time for this, but I'm surprised you haven't asked @ericsnowcurrently yet.
I wrote:
Note: Subinterpreters are sharing singletons thanks to immortal objects (PEP 683, Python 3.12).
In the past, I proposed adding Py_GetNone()
and have per-interpreter singletons: https://github.com/python/cpython/issues/83692 Apparently, @ericsnowcurrently didn't like this idea and it was decided to go with immortal objects instead.
I prefer that stable ABI is useful for alternative implementations: they nicely show the design space that CPython might want to explore in the future. If we want to hide implementation details, looking at PyPy is a pretty good way to do it. Hence, I voted for option B.
I would prefer to only consider option (B) once we will deeply design the stable ABI.
I'd prefer that we do that now -- for new additions. If we don't want to design new API now, let's stick to what works.
- (B) Implement "Py_None" API as a function call. Example:
#define Py_None Py_GetNone()
withPyObject* Py_GetNone(void)
, wherePy_GetNone()
cannot fail.
A similar option, which I believe @zooba suggested:
#define Py_None Py_GetBuiltinObject(Py_NONE)
.
- (A) Expose "Py_None" API as
PyObject *Py_None
symbol at the ABI level.
Objects exposed directly by the C-API were one of the few hard problems I had to solve for a per-interpreter GIL. If it hadn't been for immortal objects, I would have had to either do some messy macro trickery or introduce costly fixup code across most of the C-API implementation (using the existing symbols as . In addition to the extra complexity, the macro hacks would have probably been a further pain point for other compilers/implementations. The fixup code solution would have been fragile. It also doesn't help that we expose values for some of the objects (e.g. the non-exception builtin types) rather than just pointers. Note that even with object immortality I still had to add a trick for static builtin types, to accommodate tp_dict
and tp_subclasses
. Overall, the whole situation was a real pain.
Everything would have been drastically simpler if we had never exposed the objects directly in the first place. With that in mind, I firmly favor option (B). Let's move away from exposing objects directly in the C-API, rather than doing it more.
Option (B) can have a (significant?) impact on performance, especially on hot code. So far, nobody ran microbenchmarks, no number is available.
Hum, I will try to work on a benchmark to have numbers. It may help to take a decision.
I wrote https://github.com/python/cpython/pull/116572 to measure the performance overhead. I modified the constant everywhere for this benchmark, including inside Python internals. So the measurement is the worst case scenario.
Results with PGO, LTO and CPU isolation on Linux: https://github.com/python/cpython/pull/116572#issuecomment-1987822227
1.01x faster
: so it's not 2x slower, and "a little bit faster" can be just noise in the measurement.6 benchmarks are at least 1.05x slower:
| sqlite_synth | 3.73 us | 3.90 us: 1.05x slower |
+-------------------------------+----------+------------------------+
| typing_runtime_protocols | 172 us | 180 us: 1.05x slower |
+-------------------------------+----------+------------------------+
| json_dumps | 15.5 ms | 16.5 ms: 1.06x slower |
+-------------------------------+----------+------------------------+
| pickle | 17.6 us | 19.7 us: 1.12x slower |
+-------------------------------+----------+------------------------+
| pickle_dict | 43.1 us | 50.2 us: 1.17x slower |
+-------------------------------+----------+------------------------+
| pickle_list | 6.64 us | 7.87 us: 1.19x slower |
3 benchmarks are at least 1.05x faster:
| coverage | 499 ms | 152 ms: 3.29x faster |
+-------------------------------+----------+------------------------+
| dask | 885 ms | 628 ms: 1.41x faster |
+-------------------------------+----------+------------------------+
| logging_silent | 166 ns | 155 ns: 1.07x faster |
3x faster for coverage looks really strange. Maybe I didn't run benchmarks properly, I'm not sure.
Since the majority seems to prefer function calls, I prepared a full PR which documents the change: https://github.com/python/cpython/pull/116572 It's ready to be merged, waiting for a decision :-)
@iritkatriel: Do you have an opinion on this decision? Tell me if you need more details.
There doesn't seem much of a perf impact, so yeah I think option B makes more sense.
Maybe ask Mike Droettboom to help with benchmarking this?
On Mon, Mar 11, 2024 at 3:25 AM Irit Katriel @.***> wrote:
There doesn't seem much of a perf impact, so yeah I think option B makes more sense.
— Reply to this email directly, view it on GitHub https://github.com/capi-workgroup/decisions/issues/16#issuecomment-1988080251 or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWCWMUB42CE7FA74LDBFU3YXWBB5BFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVE3TENZWHEZDENRQQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRSGE3DSNJRGU2TCMFHORZGSZ3HMVZKMY3SMVQXIZI . You are receiving this email because you are subscribed to this thread.
Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .
-- --Guido van Rossum (python.org/~guido)
Maybe ask Mike Droettboom to help with benchmarking this?
I already ran a microbenchmark and pyperformance. Do you need more benchmarks? I'm not sure that I get your comment.
Maybe ask Mike Droettboom to help with benchmarking this?
I already ran a microbenchmark and pyperformance. Do you need more benchmarks? I'm not sure that I get your comment.
I am used to having the benchmark output in the specific presentation that we use for our team's benchmarking: https://github.com/faster-cpython/benchmarking-public/blob/main/README.md
But I now see that you did run pyperformance and found that it doesn't have a significant effect so I'll accept that.
My vote would currently be (A), since I abhor the idea of a macro that looks like a variable reference but actually expands to a call. Didn't we introduce immortal objects specifically so that we could have a single None
object shared between interpreters?
I wrote 3 implementations so they can be compared:
So the majority prefers option (B): Implement "Py_None" API as a function call.
Between adding 5 functions for 5 constants and adding 2 functions for 10 constants, I prefer the "Py_GetConstantRef()" approach: only add 2 functions. Moreover, it gives the choice between borrowed and strong references, even if in practice, both are the same.
I close the issue, the decision is to use function calls.
The discussion on the actual implementation can happen directly on the pull request.
Currently, the C API implements Py_None, Py_True, Py_False, Py_Ellipsis and Py_NotImplemented constants by referring to private symbols at the ABI level:
In the stable ABI, I would like to avoid referring to private symbols, but only use public symbols.
The issue https://github.com/python/cpython/issues/115754 discussed two options:
PyObject *Py_None
symbol at the ABI level.#define Py_None Py_GetNone()
withPyObject* Py_GetNone(void)
, wherePy_GetNone()
cannot fail.Please read the issue https://github.com/python/cpython/issues/115754 discussion, especially @zooba's comment https://github.com/python/cpython/issues/115754#issuecomment-1959798283 which lists possible constraints of such API. Extract:
Please vote for your preferred option, or add a comment if you want a 3rd option.
I wrote https://github.com/python/cpython/pull/115755 which is a simple change implementing option (A).
I prefer option (A). I agree that option (B) avoids any potential/hypothetical issue in the future, but I don't want to write a perfect API/ABI, only make pragmatic changes.
Option (B) can have a (significant?) impact on performance, especially on hot code. So far, nobody ran microbenchmarks, no number is available.
I would prefer to only consider option (B) once we will deeply design the stable ABI. Right now, we are far from being able to support tagged pointers for example. I would prefer to not pay a performance overhead for a theoretical issue :-(
Note: Subinterpreters are sharing singletons thanks to immortal objects (PEP 683, Python 3.12).
By the way, I already added
Py_Is(x, y)
,Py_IsNone(x)
,Py_IsTrue(x)
andPy_IsNone(x)
for PyPy. In PyPy, you cannot simply compare an object to a constant memory address (if (x == Py_None) ...
), since PyPy objects can be moved in memory.This issue is specific to CPython. Other Python implementations are free to use a different implementation and a different ABI.