Open vstinner opened 4 days ago
There's nothing particularly controversial about this API, right? Other than the question of whether it should be "real" or macros?
I'm quite happy with it being a real API.
Since PyLong_To*
functions have signature compatible with the converter signature used with the O&
format unit in PyArg_Parse()
, I think that it would be better to make the return value compatible with the converter convention: return 1 for a successful conversion and 0 if the conversion has failed. So these functions could be used with PyArg_Parse()
(and other functions in this family).
Otherwise we can end with two sets of converters with opposite return conventions. It will be very errorprone.
I think that it would be better to make the return value compatible with the converter convention: return 1 for a successful conversion and 0 if the conversion has failed.
I don't think that this use case is common enough to not follow the C API Guidelines: https://devguide.python.org/developer-workflow/c-api/index.html#guidelines-for-expanding-changing-the-public-api
For converters, I would prefer to add support for int32_t and uint64_t to Argument Clinic and reuse these functions.
think that it would be better to make the return value compatible with the converter convention
It's tempting, for sure, but I think we're better to keep converters as the exception. We really want developers to have a single mental model for how to handle return values, and that unfortunately means converters need to be clearly labelled, and not just convenient functions.
Adding some form of sized version of i
and u
to PyArg_ParseTuple
wouldn't be a bad step. That's our public API - Argument Clinic is still internal only, so we don't fix that until we need it, and it doesn't help public users at all when we do.
Other than the question of whether it should be "real" or macros?
Thank you for humoring me on the macros thing; I think it's clear we want real functions :)
keep converters as the exception
Yes. We settled on using -1 (or NULL) for errors. Unfortunately some existing functions -- or even families of functions -- don't do that; we have to live with that.
Eventually, if we get to the point where O&
sticks out too much, we might want to add a new PyArg_Parse
format code. I don't think we're there yet.
@serhiy-storchaka has some concerns about calling obj.__index__()
which can release the GIL indirectly:
Calling
__index__
can release the GIL and make values saved in C variables (borrowed references, the size of a list, etc) invalid. Existing code can rely on atomicity ofPyLong_AsUnsignedLong()
, so calling__index__
in it is an unsafe change.
IMO it's good that signed and unsigned functions have the same behavior, that both call __index__()
. Existing functions are inconsistent: signed functions (ex: PyLong_AsInt()) call __index__()
, whereas PyLong_AsUnsignedLong() don't call the method.
The GIL issue can be documented.
@serhiy-storchaka @mdboom: Do you have any additional comment about these APIs, or can I just open a vote?
I have nothing to add that hasn't already been said.
My comment was about the existing C API like PyLong_AsUnsignedLong()
and why it is not safe to change it.
We need a C API which respects special methods like __index__
, and it should be default. On other hand, we need a C API to access the content of the object. This is a contradiction. Creating two sets of functions will cause confusion, especially if there will be more than two sets, with different signatures and different calling conventions. I have an idea how to mitigate this problem, but it is for a different discussion. It does not stay on the way of this set of functions, because they all are just wrappers around PyLong_FromNativeBytes
/PyLong_AsNativeBytes
.
PyLong_FromInt32
and PyLong_FromUInt32
are redundant, because the integer argument can always be promoted to 64 bits for PyLong_FromInt64
/PyLong_FromUInt64
. We have both PyLong_FromLong
and PyLong_FromLongLong
because the latter was optional, not all compilers supported long long
.
On other hand, the lack of PyLong_ToInt16
and PyLong_ToInt8
requires adding additional check after calling PyLong_ToInt32
, and producing OverflowError with inconsistent error messages (depending on the input value) or writing more verbose code with PyLong_AsNativeBytes
.
Right, this is essentially the difference between the abstract and concrete APIs, a difference which has been eroding over the last few years.
If we think we can recover it, then we could argue the caller should PyLong_Check
before using these functions, and use a more abstract function if they don't actually have an int already.
Alternatively, if we don't expect to shut the door to concrete APIs operating abstractly, then we could document that if PyLong_Check
is true, these functions will not call back into Python code in order to convert to an integer.
I'm +0 on the latter, just out of pragmatism. But it saddens me to lose the clear abstract/concrete separation. (It will make those who want our API to be strongly typed sadder than me, though. I like it being largely abstract, I just think we ought to know which APIs are which.)
@serhiy-storchaka: It's uneasy for me to understand what do you propose.
Creating two sets of functions will cause confusion, especially if there will be more than two sets, with different signatures and different calling conventions.
So are you ok with PyLong_ToInt64() calling __index__()
or not? What do you suggest?
PyLong_FromInt32 and PyLong_FromUInt32 are redundant
Do you suggest to not add them and use PyLong_FromInt64() and PyLong_FromUInt64()? That sounds reasonable.
On other hand, the lack of PyLong_ToInt16 and PyLong_ToInt8 requires adding additional check after calling PyLong_ToInt32 (...)
I didn't want to add too many functions at the beginning, so I started with the two most common sizes: 32 and 64 bits. We can add more functions later, once we agree on the design.
If we think we can recover it, then we could argue the caller should PyLong_Check before using these functions, and use a more abstract function if they don't actually have an int already.
I agree that PyLong_Check(), or even PyLong_CheckExact(), is a good way to check if an argument is an integer.
In general, users don't care much if it's exactly an integer or not (if the object type is Python int). For example, I'm used to pass a boolean when an integer is expected (bool inherits from int), even if a boolean is a different concept than an integer (to me at least).
In general, users don't care much if it's exactly an integer or not (if the object type is Python int).
Right, but the case here is when the user cares a lot that the GIL is not released during their PyLong_ToInt##
call. They're more concerned about that than the exact type of the value.
I suspect we could probably get away with PyArg_ParseTuple
and friends doing the __index__
conversion and the specific converter functions just raising a TypeError.[^1] Surely the vast majority of not-ints appearing in an int-like context will be in function arguments or in our code (e.g. any GetItem
implementations).
The three main questions for me are:
__index__
use)[^1]: I'd also be okay with removing the extra work from AsNativeBytes
, since we haven't properly released that yet.
Why To
and not As
?
Alternatively, if we don't expect to shut the door to concrete APIs operating abstractly, then we could document that if
PyLong_Check
is true, these functions will not call back into Python code in order to convert to an integer.
Yes, this was my idea. It was discussed earlier, but I lose previous rounds (perhaps I was not persistent enough). It affects more than the C API, so it should be discussed separately. Where is the best place for this? We need not only the C API Workgroup.
So are you ok with PyLong_ToInt64() calling
__index__()
or not? What do you suggest?
I am OK with calling __index__()
in a new C API. Details is a matter of a different discussion.
Do you suggest to not add them and use PyLong_FromInt64() and PyLong_FromUInt64()? That sounds reasonable.
Do you have a use case for PyLong_FromInt32()
? Wait, do you have a use case for PyLong_FromInt64()
? long long
has at least 64 bits, so PyLong_FromLongLong()
can be used for converting from int64_t. It seems that we do not need new functions for this direction of conversion, except maybe PyLong_FromMaxInt()
.
I didn't want to add too many functions at the beginning, so I started with the two most common sizes: 32 and 64 bits. We can add more functions later, once we agree on the design.
Just keep in mind that there is a need for such features.
@serhiy-storchaka has some concerns about calling
obj.__index__()
which can release the GIL indirectly:Calling
__index__
can release the GIL and make values saved in C variables (borrowed references, the size of a list, etc) invalid. Existing code can rely on atomicity ofPyLong_AsUnsignedLong()
, so calling__index__
in it is an unsafe change.IMO it's good that signed and unsigned functions have the same behavior, that both call
__index__()
. Existing functions are inconsistent: signed functions (ex: PyLong_AsInt()) call__index__()
, whereas PyLong_AsUnsignedLong() don't call the method.The GIL issue can be documented.
Is it possible to have it documented anyway? I've no clue of why calling __index__()
should/could release the GIL and what kind of atomicity is lost. From a user prospective, I've never doubted that PyLong_AsUnsignedLong() and any of its friends may release the GIL although, hopefully, just temporarily.
Where is the best place for this? We need not only the C API Workgroup.
Either the api-evolution repository would be good, if we want to develop a "rule", though you'd have to announce it on a relevant issue to get people's attention. Or I'm sure we can find a bug related to implicit __index__
calls releasing the GIL and causing a crash (we had one recently where PySys_Audit
would do it, and there's one about bytearray
somewhere, too). Probably since there's multiple bugs, we'd want to discuss a proposal in api-evolution on how to approach it (i.e. 1. don't do it, 2. document it and how to avoid it, 3. it's okay and users should use magic to be thread safe 🙃 )
Do you have a use case for
PyLong_FromInt32()
? Wait, do you have a use case forPyLong_FromInt64()
?
Convenience, discovery, and pairing with the other API is a totally fine reason. (It may not be justified every time, but it's an okay reason.) We're looking at API development on a long scale (10+ years), and the goal is to have the right people (those who will benefit from it) using the right API in the right way. Making it easy to discover which APIs you should be using (and making it hard to discover the ones you shouldn't be using) is a big part of this, and the final decision/guess on how to do that is the WG's responsibility.
Is it possible to have it documented anyway?
Yes, documentation is cheap, at least for the APIs that already do this and won't be changing. It's worth discussing for any new APIs though.
@erlend-aasland:
Why To and not As?
As functions such as PyLong_AsLong() return the value directly, but can also return -1 on error and so PyErr_Occurred() should be checked.
long value = PyLong_AsLong(obj);
if (value == -1 && PyErr_Occurred()) { /* error */ }
vs
int64_t value;
if (PyLong_ToInt64(obj, &value) < 0) { /* error */ }
@serhiy-storchaka:
Wait, do you have a use case for PyLong_FromInt64()?
My initial motivation to propose these new <stdint.h>
functions was https://github.com/capi-workgroup/api-evolution/issues/10 discussion: replace int/unsigned long with int32_t/uint64_t to have a better ABI. I would like <stdint.h>
types to be as convenient to use with the Python C API as the classic int/long types.
As functions such as
PyLong_AsLong()
return the value directly, but can also return -1 on error and soPyErr_Occurred()
should be checked.
Not all As
APIs are designed like this; it's not a defining trait for As
APIs.
The current To
APIs often suggests a transformation (ToUppercase
, ToLowercase
, etc.); its a different kind of APIs.
Currently it looks like a legend about Newton and a cat. He made a hole in the door to allow the cat to enter or leave without taking him away from his studies. When she brought in 5 kittens, he made 5 smaller holes for the kittens.
All cases with int64_t
, int32_t
, int16_t
, int8_t
, long
, short
, and signed char
can be handled by PyLong_FromLongLong()
. No need to add PyLong_From64Int()
, PyLong_From32Int()
, PyLong_From16Int()
, PyLong_From8Int()
, PyLong_FromInt()
, PyLong_FromShort()
, and PyLong_FromSignedChar()
. If you want to cover absolute all cases, add a function that takes intmax_t
argument, it will add a new value.
I'm not sure we'll ever again see another programming language that won't allow implicitly upcasting integers (like Ada), so yeah, there's probably not a lot gained. If intmax_t
is also in stdtypes, then can we have FromInt()
?
it's not a defining trait for
As
APIs.
We should probably figure out what is a defining trait. I have this vague idea that To
APIs convert PyObject->PyObject and As
APIs convert PyObject->native, but I can't back that up fully.
If
intmax_t
is also in stdtypes, then can we haveFromInt()
?
The size of intmax_t
depends on the compiler (which perhaps delegates the decision to one of various platform ABI specs).
AFAIK, the reason we're adding FromInt64
even though we already have FromLongLong
is making it possible to avoid types like that.
The size of
intmax_t
depends on the compiler
True, and that matters most for cross-language callers, who probably need to rely on the exact size.
So FromInt64
is justified. You could argue that FromInt32
is justified on 32-bit platforms, where conversion to 64-bit might be more expensive (though I'd be incredibly surprised if it was prohibitive), and perhaps that's all we need to justify both?
Add 8 new functions to the limited C API to convert C numbers from/to Python int:
UPDATE: I removed PyLong_FromInt32() and PyLong_FromUInt32() functions.
To
functions callobj.__index__()
if obj is not a Python int.To
functions converting to unsigned integer raises ValueError if value is negativeTo
functions raise OverflowError if the value is too big (doesn't fit into the C integer range)Compared to
PyLong_AsLong()
,To
function return value is status (success or error, 0 or -1) and the result is set in the value argument (on success). It avoids having to call PyErr_Occurred() to distinguish -1 (success) and -1 (error). See Problem #1 for details.Links: