Closed vstinner closed 6 months ago
IIRC @markshannon has opinions on the API for long ints. Please ask him first.
What's the use case for this?
Why pass in a PyObject *
, rather than a PyLongObject *
?
int PyLong_Sign(PyLongObject *obj)
wouldn't need to worry about errors.
What's the use case for this?
This is something we like to have to support int<->mpz conversion of "big enough" int's with mpz_import/export. See current code in the gmpy2 (similar approach has Sage): for reading and for writing.
Now we have *From/AsNativeBytes
functions to do conversion of "big enough" int's, but this is much more slow than above approach.
Together with PyLong_Sign()
, we need functions to access absolute value of "big enough" integer as an array of "digits" (for reading or writing). The GMP supports this kind of API with mpz_limbs_read()
and mpz_limbs_write()
functions. Perhaps, this is a more simple alternative (on CPython side) to PyInt_Import/Export functions.
Edit, a complete interface:
/* reading */
int PyLong_Sign(PyLongObject *obj); // mpz_sgn()
Py_ssize_t PyLong_DigitCount(PyLongObject *obj); // mpz_size()
const digit * PyLong_AsDigits(PyLongObject *obj); // mpz_limbs_read()
/* writing (former _PyLong_FromDigits, used in _decimal.c) */
PyLongObject* PyLong_FromDigits(int negative, Py_ssize_t digit_count, digit *digits);
int PyLong_Sign(PyLongObject *obj) wouldn't need to worry about errors.
This is something I was thinking first. Clearly, it's enough for above use case. If this kind of API is acceptable (so far, I see only PyUnstable_*
functions of this type) - I would like to adopt one.
@markshannon:
What's the use case for this?
A code search on PyPI top 5,000 projects (at 2023-11-15) finds usage in 10 projects:
Code:
One usage is to ... test the sign of a number :-) Examples:
bool neg = _PyLong_Sign(obj) < 0;
and
if (_PyLong_Sign(value) >= 0) ...
@markshannon:
Why pass in a PyObject , rather than a PyLongObject ?
For consistency with other PyLong APIs. Example: long PyLong_AsLong(PyObject *)
.
Generic PyObject*
vs specific type (such as PyLongObject*
) was also discussed in the Add PyDict_GetItemRef() function issue. It was decided to stick to PyObject*
.
Generic PyObject vs specific type (such as PyLongObject) was also discussed in the https://github.com/python/cpython/issues/106004 issue. It was decided to stick to PyObject*.
JFR, it was discussed rather in the pr thread: https://github.com/python/cpython/pull/106005
Perhaps, this is the relevant issue: https://github.com/capi-workgroup/api-evolution/issues/29
By the way, I'm surprised that the C API has no function to compare a Python int object to a C integer, something like:
int PyLong_CompareWithLong(PyObject*, long, int *result)
*result
is set to -1 if less than, 0 if equal, +1 if greater than. I'm mentiong such hypothetical function since _PyLong_Sign(obj)
is currently used by a few projects to compare a Python int to zero: is it equal to zero? Smaller than zero? Greater than zero?
It could be done with:
int cmp;
if (PyLong_CompareWithLong(number, 0, &cmp) < 0) ... handle error ...
if (cmp == 0) ... equal to zero
else if (cmp < 0) ... smaller than zero
else ... greater than zero
Another data: I just added Py_GetConstantBorrowed(Py_CONSTANT_ZERO)
and Py_GetConstantBorrowed(Py_CONSTANT_ONE)
to the C API, and these objects can be used with PyObject_RichCompareBool()
.
int cmp;
PyObject *zero = Py_GetConstantBorrowed(Py_CONSTANT_ONE);
cmp = PyObject_RichCompareBool(number, zero);
if (cmp < 0 && PyErr_Occurred()) ... handle error ...
if (cmp == 0) ... equal to zero
else if (cmp < 0) ... smaller than zero ...
else ... greater than zero ...
But I dislike this code, since it uses a borrowed reference and it requires to call PyErr_Occurred()
:-(
@encukou @gvanrossum @iritkatriel @zooba: What's your opinion on adding int PyLong_Sign(PyObject *obj, int *sign)
API to Python 3.13?
+1 from me.
There was no formal vote, but most members of the WG decided to use PyObject *
rather than concrete types:
All our deliberately public APIs should be PyObject * to minimise abstraction leakage, and do their type checks (raising TypeErrors, not assertions). Fast APIs that skip checks should be the special case, not the default.
For “evolution”, let's stick to that.
Hopefully PyObject_IsTrue
is a relatively quick way to compare to zero (assuming you already know you have the right type).
I'm good with adding the function, and I'm about 50/50 between Victor's proposed API and one that returns a "switch"-able result:
switch(PyLong_Sign(o)) {
case PY_LONG_POSITIVE:
case PY_LONG_NEGATIVE:
case PY_LONG_ZERO:
break;
default:
// error raised
}
Though honestly, none of the times when I've needed this have I needed to optimise for a sign check before conversion, especially since for compact values it'll cost about the same to assign *sign
as *value
. But possibly there's a use for checking without converting at all.
In general I find APIs with output variables awkward, so I want to use them only when necessary. Since the sign function has only four possible outcomes (error, negative, zero, positive) I feel it doesn't really warrant the output variable, so I'm supportive of Steve's solution.
In general I find APIs with output variables awkward, so I want to use them only when necessary. Since the sign function has only four possible outcomes (error, negative, zero, positive) I feel it doesn't really warrant the output variable, so I'm supportive of Steve's solution.
I agree, with -1 for error and non-negative values for the three valid outcomes.
I'm fine with any API as soon as it doesn't require to call PyErr_Occurred().
If we go with the enum-like approach, I suggest to use more specific names, add "IS" in the name, and use Py prefix rather than PY_:
#define Py_LONG_IS_ZERO 0
#define Py_LONG_IS_NEGATIVE 1
#define Py_LONG_IS_POSITIVE 2
Usage would be:
int sign = PyLong_Sign(obj);
if (sign < 0) { ... error ...; }
if (sign == Py_LONG_IS_ZERO || sign == Py_LONG_IS_POSITIVE) {
... obj >= 0 ...
}
if (sign == Py_LONG_IS_NEGATIVE) {
... obj < 0 ...
}
The advantage of int PyLong_Sign(PyObject *obj, int *sign)
API is that no constant needs to be defined and sign can be used more naturally:
int sign;
if (PyLong_Sign(obj, &sign) < 0) { ... error ...; }
if (sign >= 0) {
... obj >= 0 ...
}
if (sign < 0) {
... obj < 0 ...
}
The gmpy project needs to check if a number is negative: https://github.com/aleaxit/gmpy/blob/eb8dfcbd84abcfcb36b4adcb0d5c6d050731dd75/src/gmpy2_convert_gmp.c#L41-L75
int negative = _PyLong_IsNegative(templong);
...
if (negative) {
mpz_neg(result->z, result->z);
}
So any API would be fine.
How about we define the result as follows?
-2
error-1
negative0
zero1
positiveI think it will be confusing in light of the c api convention of <0 indicating error. That's probably stronger in this case than the convention on comparison result being -1,0,1.
PyObject_RichCompareBool() API doesn't have this issue since it takes two arguments and a 3rd is the comparison operator:
/* Perform a rich comparison with integer result. This wraps
PyObject_RichCompare(), returning -1 for error, 0 for false, 1 for true. */
int PyObject_RichCompareBool(PyObject *v, PyObject *w, int op)
How about we define the result as follows?
I very nearly suggested this myself. I'm okay with it, apart from not really being convinced this function is special enough to justify the -2. A near equivalent could be:
-1
error0
negative1
zero2
positiveSuch that PyLong_Sign(x) - 1
behaves the same. But that kinda feels like the sort of cleverness that we don't really do anymore, and any such use is one code review away from being "fixed" to use named constants.
Okay, so how about
-1
error0
negative1
zero2
positive?Then you can calculate the "classic" sign()
function by first checking for error and then subtracting 1. If you've already ensured it's a PyLong
you can skip the error check. After errors are ruled out, all the various tests can be expressed using comparisons to 1
. The advantage of not using enums is that no switch
is required.
This might be too much of a detour, but:
Another option is to treat this as “compare to zero”, and start adding comparison API that uses the “slightly strange” enum based on powers of two that @markshannon introduced for the internal COMPARISON_*
macros. That would be:
-1
error1
reserved for “unordered”, like NaN)2
negative4
positive8
zeroAFAICS, this can be shuffled a bit so that you can get a classic cmp()
result by subtracting 2. (Not a classic sign()
result though: it'd be -1
/0
/2
.)
-1
error1
negative2
zero4
positive8
reserved for “unordered”, like NaN)(Assuming the ergonomics are worth a few extra CPU instructions -- e.g. with #define Py_COMPARISON_BIT(x, y) 0xf & (0x2418 >> (8*((x) >= (y))) + 4*((x) <= (y)))
)
I'd avoid the "slightly strange" return values. There is a good reason for them being the way they are, but it probably only makes sense to use them internally. Plus, the overhead of a function call will dominate any bit shuffling, so we might as well keep the return values simple.
I still don't why we need an error value at all. All integers have a sign and this function will never need to allocate any memory.
I still think int PyLong_Sign(PyLongObject *obj)
returning the "obvious" values of -1 (for negative), 0, or +1, is a lot easier to use than any of the other alternatives.
There was no formal vote, but most members of the WG https://github.com/capi-workgroup/api-evolution/issues/29#issuecomment-1768441231 rather than concrete types
Using PyObject *
consistently for the high level API makes sense, but adds unnecessary overhead for lower level API.
Getting the sign of an integer as a C int seems quite low level to me.
I still don't why we need an error value at all.
Because we don't have a strongly typed public API, we have no way to know that the incoming value is an int. We need to check the type and return something if the API is misused - reading arbitrary memory isn't okay.
Unfortunately, the intended "abstract API" vs "concrete API" distinction has long given up on assuming input types. The only "lower level API" we have is anything marked private, which is what we're changing here. We can keep the private API that assumes the type, but the public one has to check and handle it safely.
Because we don't have a strongly typed public API, we have no way to know that the incoming value is an int.
int PyLong_Sign(PyLongObject *obj)
is strongly typed and we do know that the incoming value is an PyLongObject
, because it says so.
You could argue that someone could cast a PyUnicodeObject *
to PyLongObject *
, but casts are a fact of life in C.
Py_DECREF((PyObject *)7)
is legal C, but we don't do elaborate checks to ensure that the argument of Py_DECREF
is a valid pointer to a PyObject
. We don't even check that it isn't NULL
.
We have to assume a basic level of competence in the users of the C API, so why not here?
By "we don't have" I meant in general, our public API is not strongly typed like that. We don't ever expect users of the API to handle types other than PyObject *
or their own - PyLongObject
is basically internal.
Yes we could add it, but we can't change what's already there, and so we decided that it's not going to be a useful evolution of the language. If we decide that a strongly typed C interface is valuable in a new API design (not a given, since we're being far more considerate of non-C users), then in a revamp we may well expose more C types as part of the API.
But for now, we're not going in that direction. So that's "why not here".
Are there no parts of the C API where we specify a function that takes a specific object type as argument and returns a value without possibility of error? I looked for this in the API docs and found a smattering, e.g. PyCode_GetNumFree
, PyFrame_GetBuiltins
.
It would be a novelty for the PyLong
API though, and I'm not sure that it's worth deviating from the pattern by having this one function that's strongly typed. Plus, most likely, what the user code has in its hands is typed as PyObject *
, because that's what you get when you construct a long (PyLong_FromLong
etc.) or when you get something from another object, e.g. through PyObject_GetAttr
or PyDict_GetItem
. So we'd just be encouraging users to add a cast, which they'll do blindly. That, too, is a fact of life.
So in the end I agree with Steve.
Are there no parts of the C API where we specify a function that takes a specific object type as argument and returns a value without possibility of error?
If we count Unstable C API, then there are PyUnstable_Long_IsCompact()
and PyUnstable_Long_CompactValue()
, working with integer objects.
I'm not sure that it's worth deviating from the pattern by having this one function that's strongly typed.
I doubt this function has much sense alone. So, please consider other possible extensions of the public API, mentioned above (e.g. PyLong_DigitCount()
, PyLong_AsDigits()
).
Yep. Alas, using PyLongObject*
only for new functions only would make things too inconsistent.
Are there no parts of the C API where we specify a function that takes a specific object type as argument and returns a value without possibility of error? [...]
Incidentally, I'm (slowly) collecting this kind of data for my Language Summit topic, which will be about adding the kind of API Mark wants in a consistent way. However, the info I have doesn't know about macros or functions only documented or implicitly understood to never fail, so my list is pretty small:
PyType_GetFlags
, PyUnstable_Long_CompactValue
Plus there are ones that return void
(PyType_Modified
, PyFunction_SetVectorcall
; perhaps Py_SET_SIZE
counts) and ones that take a specific object in addition to PyObject*
(Py_IS_TYPE
, Py_SET_TYPE
)
But we can also look at all functions that take a specific object. There aren't that many, and the vast majority of them deal with PyTypeObject
, PyCodeObject
, and PyFrameObject
.
I think we can reasonably hypothesise (and perhaps Guido can confirm) that the PyTypeObject *
functions were initially likely to deal with static instantiations of PyTypeObject
, either exported from CPython or defined in users' own code. So those would require a (valid) cast to PyObject *
for cases where they were to be treated as first-class objects, but for most type object operations are easier to reference directly.
Anything allocated or instantiated by CPython would be object
a.k.a. PyObject *
, and so the "real" object type would never be referenced. Like having to cast malloc
's void *
result to the type you want, callers would have to cast PyObject *
to PyLongObject *
. But the principle here is EAFP, and so just as an instance of int
can be passed to any Python function, an instance of PyLongObject
can be passed to any C API function, which will then take responsibility for deciding whether the caller deserves forgiveness or not.
I suspect the other functions taking more specific object types directly were either added without considering this principle or were being consistent with an earlier API that returned non-object
structs. So it could have gone either way - for me, I'd have said if they are opaque to make them PyObject *
in calls, and if they are meant to be used as structs then to remove the Object
from the name and define their own lifetime semantics.
I doubt this function has much sense alone. So, please consider other possible extensions of the public API, mentioned above (e.g. PyLong_DigitCount(), PyLong_AsDigits()).
If we do go this way, we should also define PyLong_DigitSize
to return the number of bits in each digit. The point of using a standard interchange format (bytes) is to hide the internal implementation details - if we decide to expose those details, then they should be runtime parameterised (so compact values can return a DigitCount()=1
and DigitSize()=8*sizeof(internal repr)
to avoid the recalculation work it would take to put it into "normal" digits).
I think we can reasonably hypothesise (and perhaps Guido can confirm) that the
PyTypeObject *
functions were initially likely to deal with static instantiations ofPyTypeObject
, either exported from CPython or defined in users' own code. [...]
Yes, exactly.
I suspect the other functions taking more specific object types directly were either added without considering this principle or were being consistent with an earlier API that returned non-
object
structs. So it could have gone either way - for me, I'd have said if they are opaque to make themPyObject *
in calls, and if they are meant to be used as structs then to remove theObject
from the name and define their own lifetime semantics.
More likely, the Code
and Frame
were originally created primarily for internal use. The core of the interpreter uses these a lot and often accesses the members directly, so it's more convenient to have the API functions take the actual object types. We're not very consistent in this, alas, and the interpreter code is still full of casts, but since we're feeling the need for speed, repeated dynamic type checks are just wasted time on the fast path. Eventually, we have evolved a parallel set of internal APIs, so we care less about the performance of the public APIs, but it's too late to change those.
Maybe for 3.14 we can sit down with Mark and someone from (e.g.) GMP and design a brand new API for PyLong
that is statically typed and provide all the types of access that libraries like GMP need while allowing the appropriate amount of API evolution/stability from the perspective of likely future improvements to the PyLong
implementation. (Though I suspect that the step function there will be tagged pointers, not changes in object representation.)
FWIW, I still prefer the original proposal to the 0/1/2. I find consistency in the error indicator (-1) very important; I also don't think avoiding the output parameter is worth messing with the traditional sign value (-1/0/+1).
Looking at the guidelines proposal I've been writing, I'd generally like to go for a certain strict, even mechanical consistency, plus sometimes adding an extra variant that trades that consistency for convenience or speed.
So, why not both?
int PyLong_Sign(PyObject *obj, int *result)
*result
to -1/0/+1*result
to -2 to make it switchableint PyLong_SignUnchecked(PyLongObject *obj)
assert
); cannot signal an errorWhy not both? Because this use case isn't important enough to have two separate APIs with different signature and behavior.
I can live with having just the two-argument version if we rename it to PyLong_GetSign
.
If we do provide both, PyLong_SignUnchecked
is the one that will get used.
I'd be very surprised if users chose the more complex variant over the simpler one.
If we were to do both, the latter ought to be PyUnstable_Long_Sign
(or PyUnstableLong_Sign
? Or PyLongUnstable_Sign
?) and probably be defined as an inline function (like the existing private function).
Otherwise, 100% agree with Guido (and 90% with Mark: I think the pointer cast will sometimes be less convenient than chaining conditions (if (...Sign(x, &s) && s > 0 && ...NativeBytes(...)
), but for the most part the simpler function will win).
I can live with having just the two-argument version if we rename it to
PyLong_GetSign
.
Sounds great!
PyLong_SignUnchecked
is the one that will get used. I'd be very surprised if users chose the more complex variant over the simpler one.
That's OK -- with a name that announces that this function is unusual. (And if we add it, I'd consider it inconsistent to not have the “regular” variant.)
PyUnstable_Long_Sign
I don't think we need to worry about API stability here. IMO, the convenient API is inconsistent with how we want new API to look. While consistency and stability are correlated, I'd rather keep them separate (Unchecked
vs. Unstable
).
int PyLong_GetSign(PyObject *obj, int *sign)
On success, set *sign
to sign of integer object obj (0, -1
or +1 for zero, negative or positive integer, respectively)
and return 0.
On failure, return -1 with an exception set.
This function always succeeds if obj is a
PyLongObject
or its subtype.
Have any of the likely users asked for this two argument form, as opposed to int PyLong_GetSign(PyLongObject *obj)
?
It hardly matters, they aren't getting one that accepts PyLongObject
as an argument. The choice will be between putting the error code or the result in an output parameter, and previously we've agreed (in a very rare case of agreement 😄 ) that the error code should be the returned value.
@iritkatriel, what are your thoughts?
@erlend-aasland: Are you ok with API proposed in https://github.com/capi-workgroup/decisions/issues/19#issuecomment-2088230212 ? Tell me if you need more context/information. And it's ok if you need more time to decide.
I'm fine with the proposed API; I voted.
Thanks all. The SC adopted int PyLong_GetSign(PyObject *obj, int *sign)
API: https://github.com/capi-workgroup/decisions/issues/19#issuecomment-2088230212. I close the issue.
I concur with @markshannon. For such low-level function the overhead of checking the type is too high, and it makes the API less convenient.
Also, I think that PyLong_IsPositive()
, PyLong_IsNegative()
and PyLong_IsZero()
are more useful in practice and adds less overhead. So I propose to implement these functions first, and then look if there are enough use cases are left for PyLong_Sign()
.
These functions are used when performance is very important. When it is not important, you can use PyObject_RichCompareBool()
.
These functions are used when performance is very important.
In which kind of code the performance of PyLong_GetSign() would matter? I don't expect this function to be called often, like not more than once per integer, and not in "hot code".
PyLong_GetSign() cannot fail if the argument is a Python int. If you know that the parameter is a Python int, you can simply ignore the error handling (maybe using an assertion, just in case).
I do not remember details, it was many years ago, but I used Py_SIZE(x) < 0
instead of _PyLong_Sign(x) < 0
more than once in the past because the difference was noticeable. You can search all occurrences of "_PyLong_Is" and try to replace them with _PyLong_Sign
.
BTW, the _PyLong_Is*()
functions are used much more (104 occurrences total: 26 _PyLong_IsZero
, 7 _PyLong_IsPositive
and 71 _PyLong_IsNegative
) than _PyLong_Sign()
(8). And several of these _PyLong_Sign()
calls could be replaced with _PyLong_Is*()
.
The API proposed here is not only slower, it is less convenient. You need to introduce a variable, and even if you ignore error, you cannot use it in expression.
int sign;
(void)PyLong_Sign(obj, &sign);
if (sign < 0) {
instead of
if (PyLong_IsNegative(obj)) {
Multiply this by 112 use cases.
The API proposed here is not only slower, it is less convenient.
These aspects were taken in account in the decision.
This API takes PyObject*
. If you consider that it's way too slow and the API is not convenient, you can propose adding a PyUnstable
API which takes PyLongObject*
. I'm not convinced that it's needed.
Also, this issue is not closed, so I suggest to open a new one.
API:
int PyLong_Sign(PyObject *obj, int *sign)
Retrieve the sign of integer object obj (
0
,-1
or+1
for zero, negative or positive integer, respectively) in a variable sign.Return
0
on success, else-1
with an exception set. This function always succeeds if obj is a :c:type:PyLongObject
or its subtype.PR: https://github.com/python/cpython/pull/116561
I would like to propose adding the API directly to the limited API, what do you think?