Add PyLong_Sign() public function

vstinner commented 3 months ago

API: int PyLong_Sign(PyObject *obj, int *sign)

Retrieve the sign of integer object obj (0, -1 or +1 for zero, negative or positive integer, respectively) in a variable sign.

Return 0 on success, else -1 with an exception set. This function always succeeds if obj is a :c:type:PyLongObject or its subtype.

PR: https://github.com/python/cpython/pull/116561

I would like to propose adding the API directly to the limited API, what do you think?

gvanrossum commented 3 months ago

IIRC @markshannon has opinions on the API for long ints. Please ask him first.

markshannon commented 3 months ago

What's the use case for this?

Why pass in a PyObject *, rather than a PyLongObject *? int PyLong_Sign(PyLongObject *obj) wouldn't need to worry about errors.

skirpichev commented 3 months ago

What's the use case for this?

This is something we like to have to support int<->mpz conversion of "big enough" int's with mpz_import/export. See current code in the gmpy2 (similar approach has Sage): for reading and for writing.

Now we have *From/AsNativeBytes functions to do conversion of "big enough" int's, but this is much more slow than above approach.

Together with PyLong_Sign(), we need functions to access absolute value of "big enough" integer as an array of "digits" (for reading or writing). The GMP supports this kind of API with mpz_limbs_read() and mpz_limbs_write() functions. Perhaps, this is a more simple alternative (on CPython side) to PyInt_Import/Export functions.

Edit, a complete interface:

/* reading */
int PyLong_Sign(PyLongObject *obj);  // mpz_sgn()
Py_ssize_t PyLong_DigitCount(PyLongObject *obj);  // mpz_size()
const digit * PyLong_AsDigits(PyLongObject *obj);  // mpz_limbs_read()

/* writing (former _PyLong_FromDigits, used in _decimal.c) */
PyLongObject* PyLong_FromDigits(int negative, Py_ssize_t digit_count, digit *digits);

int PyLong_Sign(PyLongObject *obj) wouldn't need to worry about errors.

This is something I was thinking first. Clearly, it's enough for above use case. If this kind of API is acceptable (so far, I see only PyUnstable_* functions of this type) - I would like to adopt one.

vstinner commented 3 months ago

@markshannon:

What's the use case for this?

A code search on PyPI top 5,000 projects (at 2023-11-15) finds usage in 10 projects:

Cython (3.0.5)
cffi (1.16.0)
datatable (1.0.0)
line_profiler (4.1.2)
mariadb (1.1.8)
msgspec (0.18.4)
orjson (3.9.10)
pickle5 (0.0.12)
pyodbc (5.0.1)
zodbpickle (3.1)

Code:

``` cffi-1.16.0.tar.gz: cffi-1.16.0/src/c/_cffi_backend.c: if (_PyLong_Sign(ob) < 0) cffi-1.16.0.tar.gz: cffi-1.16.0/src/c/_cffi_backend.c: return _PyLong_Sign(ob) != 0; Cython-3.0.5.tar.gz: Cython-3.0.5/Cython/Utility/TypeConversion.c: #define __Pyx_PyLong_Sign(x) (((PyLongObject*)x)->long_value.lv_tag & _PyLong_SIGN_MASK) Cython-3.0.5.tar.gz: Cython-3.0.5/Cython/Utility/TypeConversion.c: #define __Pyx_PyLong_IsNeg(x) ((__Pyx_PyLong_Sign(x) & 2) != 0) Cython-3.0.5.tar.gz: Cython-3.0.5/Cython/Utility/TypeConversion.c: #define __Pyx_PyLong_IsZero(x) (__Pyx_PyLong_Sign(x) & 1) Cython-3.0.5.tar.gz: Cython-3.0.5/Cython/Utility/TypeConversion.c: #define __Pyx_PyLong_IsPos(x) (__Pyx_PyLong_Sign(x) == 0) Cython-3.0.5.tar.gz: Cython-3.0.5/Cython/Utility/TypeConversion.c: ((1 - (Py_ssize_t) __Pyx_PyLong_Sign(x)) * __Pyx_PyLong_DigitCount(x)) Cython-3.0.5.tar.gz: Cython-3.0.5/Cython/Utility/TypeConversion.c: #define __Pyx_PyLong_CompactValue(x) ((1 - (Py_ssize_t) __Pyx_PyLong_Sign(x)) * (Py_ssize_t) __Pyx_PyLong_Digits(x)[0]) datatable-1.0.0.tar.gz: datatable-1.0.0/src/core/python/int.cc: int sign = _PyLong_Sign(v); datatable-1.0.0.tar.gz: datatable-1.0.0/src/core/python/int.cc: int sign = _PyLong_Sign(v); datatable-1.0.0.tar.gz: datatable-1.0.0/src/core/python/obj.cc: int sign = _PyLong_Sign(v); line_profiler-4.1.2.tar.gz: line_profiler-4.1.2/line_profiler/python25.pxd: int _PyLong_Sign (object) # No error. mariadb-1.1.8.tar.gz: mariadb-1.1.8/mariadb/mariadb_codecs.c: if (_PyLong_Sign(value->value) < 0) msgspec-0.18.4.tar.gz: msgspec-0.18.4/msgspec/_core.c: bool neg = _PyLong_Sign(obj) < 0; pickle5-0.0.12.tar.gz: pickle5-0.0.12/pickle5/_pickle.c: int sign = _PyLong_Sign(obj); pyodbc-5.0.1.tar.gz: pyodbc-5.0.1/src/connection.cpp: if (_PyLong_Sign(value) >= 0) pyodbc-5.0.1.tar.gz: pyodbc-5.0.1/src/params.cpp: pNum->sign = _PyLong_Sign(cell) >= 0; zodbpickle-3.1.tar.gz: zodbpickle-3.1/src/zodbpickle/_pickle_33.c: int sign = _PyLong_Sign(obj); ```

One usage is to ... test the sign of a number :-) Examples:

bool neg = _PyLong_Sign(obj) < 0;

and

if (_PyLong_Sign(value) >= 0) ...

@markshannon:

Why pass in a PyObject , rather than a PyLongObject ?

For consistency with other PyLong APIs. Example: long PyLong_AsLong(PyObject *).

Generic PyObject* vs specific type (such as PyLongObject*) was also discussed in the Add PyDict_GetItemRef() function issue. It was decided to stick to PyObject*.

skirpichev commented 3 months ago

Generic PyObject vs specific type (such as PyLongObject) was also discussed in the https://github.com/python/cpython/issues/106004 issue. It was decided to stick to PyObject*.

JFR, it was discussed rather in the pr thread: https://github.com/python/cpython/pull/106005

Perhaps, this is the relevant issue: https://github.com/capi-workgroup/api-evolution/issues/29

vstinner commented 3 months ago

By the way, I'm surprised that the C API has no function to compare a Python int object to a C integer, something like:

int PyLong_CompareWithLong(PyObject*, long, int *result)

Returns 0 on success. In that case, *result is set to -1 if less than, 0 if equal, +1 if greater than.
Set an exception and return -1 on error.

I'm mentiong such hypothetical function since _PyLong_Sign(obj) is currently used by a few projects to compare a Python int to zero: is it equal to zero? Smaller than zero? Greater than zero?

It could be done with:

int cmp;
if (PyLong_CompareWithLong(number, 0, &cmp) < 0) ... handle error ...
if (cmp == 0) ... equal to zero
else if (cmp < 0) ... smaller than zero
else ... greater than zero

Another data: I just added Py_GetConstantBorrowed(Py_CONSTANT_ZERO) and Py_GetConstantBorrowed(Py_CONSTANT_ONE) to the C API, and these objects can be used with PyObject_RichCompareBool().

int cmp;
PyObject *zero = Py_GetConstantBorrowed(Py_CONSTANT_ONE);
cmp = PyObject_RichCompareBool(number, zero);
if (cmp < 0 && PyErr_Occurred()) ... handle error ...
if (cmp == 0) ... equal to zero
else if (cmp < 0) ... smaller than zero ...
else ... greater than zero ...

But I dislike this code, since it uses a borrowed reference and it requires to call PyErr_Occurred() :-(

vstinner commented 2 months ago

@encukou @gvanrossum @iritkatriel @zooba: What's your opinion on adding int PyLong_Sign(PyObject *obj, int *sign) API to Python 3.13?

encukou commented 2 months ago

+1 from me.

There was no formal vote, but most members of the WG decided to use PyObject * rather than concrete types:

All our deliberately public APIs should be PyObject * to minimise abstraction leakage, and do their type checks (raising TypeErrors, not assertions). Fast APIs that skip checks should be the special case, not the default.

For “evolution”, let's stick to that.

zooba commented 2 months ago

Hopefully PyObject_IsTrue is a relatively quick way to compare to zero (assuming you already know you have the right type).

I'm good with adding the function, and I'm about 50/50 between Victor's proposed API and one that returns a "switch"-able result:

switch(PyLong_Sign(o)) {
case PY_LONG_POSITIVE:
case PY_LONG_NEGATIVE:
case PY_LONG_ZERO:
    break;
default:
    // error raised
}

Though honestly, none of the times when I've needed this have I needed to optimise for a sign check before conversion, especially since for compact values it'll cost about the same to assign *sign as *value. But possibly there's a use for checking without converting at all.

gvanrossum commented 2 months ago

In general I find APIs with output variables awkward, so I want to use them only when necessary. Since the sign function has only four possible outcomes (error, negative, zero, positive) I feel it doesn't really warrant the output variable, so I'm supportive of Steve's solution.

iritkatriel commented 2 months ago

In general I find APIs with output variables awkward, so I want to use them only when necessary. Since the sign function has only four possible outcomes (error, negative, zero, positive) I feel it doesn't really warrant the output variable, so I'm supportive of Steve's solution.

I agree, with -1 for error and non-negative values for the three valid outcomes.

vstinner commented 2 months ago

I'm fine with any API as soon as it doesn't require to call PyErr_Occurred().

If we go with the enum-like approach, I suggest to use more specific names, add "IS" in the name, and use Py prefix rather than PY_:

#define Py_LONG_IS_ZERO 0
#define Py_LONG_IS_NEGATIVE 1
#define Py_LONG_IS_POSITIVE 2

Usage would be:

int sign = PyLong_Sign(obj);
if (sign < 0) { ... error ...; }
if (sign == Py_LONG_IS_ZERO || sign == Py_LONG_IS_POSITIVE) {
    ... obj >= 0 ...
}
if (sign == Py_LONG_IS_NEGATIVE) {
    ... obj < 0 ...
}

The advantage of int PyLong_Sign(PyObject *obj, int *sign) API is that no constant needs to be defined and sign can be used more naturally:

int sign;
if (PyLong_Sign(obj, &sign) < 0) { ... error ...; }
if (sign >= 0) {
    ... obj >= 0 ...
}
if (sign < 0) {
    ... obj < 0 ...
}

The gmpy project needs to check if a number is negative: https://github.com/aleaxit/gmpy/blob/eb8dfcbd84abcfcb36b4adcb0d5c6d050731dd75/src/gmpy2_convert_gmp.c#L41-L75

int negative = _PyLong_IsNegative(templong);
...
if (negative) {
    mpz_neg(result->z, result->z);
}

So any API would be fine.

gvanrossum commented 2 months ago

How about we define the result as follows?

-2 error
-1 negative
0 zero
1 positive

iritkatriel commented 2 months ago

I think it will be confusing in light of the c api convention of <0 indicating error. That's probably stronger in this case than the convention on comparison result being -1,0,1.

vstinner commented 2 months ago

PyObject_RichCompareBool() API doesn't have this issue since it takes two arguments and a 3rd is the comparison operator:

/* Perform a rich comparison with integer result.  This wraps
   PyObject_RichCompare(), returning -1 for error, 0 for false, 1 for true. */
int PyObject_RichCompareBool(PyObject *v, PyObject *w, int op)

zooba commented 2 months ago

How about we define the result as follows?

I very nearly suggested this myself. I'm okay with it, apart from not really being convinced this function is special enough to justify the -2. A near equivalent could be:

-1 error
0 negative
1 zero
2 positive

Such that PyLong_Sign(x) - 1 behaves the same. But that kinda feels like the sort of cleverness that we don't really do anymore, and any such use is one code review away from being "fixed" to use named constants.

gvanrossum commented 2 months ago

Okay, so how about

-1 error
0 negative
1 zero
2 positive?

Then you can calculate the "classic" sign() function by first checking for error and then subtracting 1. If you've already ensured it's a PyLong you can skip the error check. After errors are ruled out, all the various tests can be expressed using comparisons to 1. The advantage of not using enums is that no switch is required.

encukou commented 2 months ago

This might be too much of a detour, but: Another option is to treat this as “compare to zero”, and start adding comparison API that uses the “slightly strange” enum based on powers of two that @markshannon introduced for the internal COMPARISON_* macros. That would be:

-1 error
(1 reserved for “unordered”, like NaN)
2 negative
4 positive
8 zero

AFAICS, this can be shuffled a bit so that you can get a classic cmp() result by subtracting 2. (Not a classic sign() result though: it'd be -1/0/2.)

-1 error
1 negative
2 zero
4 positive
(8 reserved for “unordered”, like NaN)

(Assuming the ergonomics are worth a few extra CPU instructions -- e.g. with #define Py_COMPARISON_BIT(x, y) 0xf & (0x2418 >> (8*((x) >= (y))) + 4*((x) <= (y))))

markshannon commented 2 months ago

I'd avoid the "slightly strange" return values. There is a good reason for them being the way they are, but it probably only makes sense to use them internally. Plus, the overhead of a function call will dominate any bit shuffling, so we might as well keep the return values simple.

I still don't why we need an error value at all. All integers have a sign and this function will never need to allocate any memory.

I still think int PyLong_Sign(PyLongObject *obj) returning the "obvious" values of -1 (for negative), 0, or +1, is a lot easier to use than any of the other alternatives.

markshannon commented 2 months ago

There was no formal vote, but most members of the WG https://github.com/capi-workgroup/api-evolution/issues/29#issuecomment-1768441231 rather than concrete types

Using PyObject * consistently for the high level API makes sense, but adds unnecessary overhead for lower level API. Getting the sign of an integer as a C int seems quite low level to me.

zooba commented 2 months ago

I still don't why we need an error value at all.

Because we don't have a strongly typed public API, we have no way to know that the incoming value is an int. We need to check the type and return something if the API is misused - reading arbitrary memory isn't okay.

Unfortunately, the intended "abstract API" vs "concrete API" distinction has long given up on assuming input types. The only "lower level API" we have is anything marked private, which is what we're changing here. We can keep the private API that assumes the type, but the public one has to check and handle it safely.

markshannon commented 2 months ago

Because we don't have a strongly typed public API, we have no way to know that the incoming value is an int.

int PyLong_Sign(PyLongObject *obj) is strongly typed and we do know that the incoming value is an PyLongObject, because it says so.

You could argue that someone could cast a PyUnicodeObject * to PyLongObject *, but casts are a fact of life in C. Py_DECREF((PyObject *)7) is legal C, but we don't do elaborate checks to ensure that the argument of Py_DECREF is a valid pointer to a PyObject. We don't even check that it isn't NULL.

We have to assume a basic level of competence in the users of the C API, so why not here?

zooba commented 2 months ago

By "we don't have" I meant in general, our public API is not strongly typed like that. We don't ever expect users of the API to handle types other than PyObject * or their own - PyLongObject is basically internal.

Yes we could add it, but we can't change what's already there, and so we decided that it's not going to be a useful evolution of the language. If we decide that a strongly typed C interface is valuable in a new API design (not a given, since we're being far more considerate of non-C users), then in a revamp we may well expose more C types as part of the API.

But for now, we're not going in that direction. So that's "why not here".

gvanrossum commented 2 months ago

Are there no parts of the C API where we specify a function that takes a specific object type as argument and returns a value without possibility of error? I looked for this in the API docs and found a smattering, e.g. PyCode_GetNumFree, PyFrame_GetBuiltins.

It would be a novelty for the PyLong API though, and I'm not sure that it's worth deviating from the pattern by having this one function that's strongly typed. Plus, most likely, what the user code has in its hands is typed as PyObject *, because that's what you get when you construct a long (PyLong_FromLong etc.) or when you get something from another object, e.g. through PyObject_GetAttr or PyDict_GetItem. So we'd just be encouraging users to add a cast, which they'll do blindly. That, too, is a fact of life.

So in the end I agree with Steve.

skirpichev commented 2 months ago

Are there no parts of the C API where we specify a function that takes a specific object type as argument and returns a value without possibility of error?

If we count Unstable C API, then there are PyUnstable_Long_IsCompact() and PyUnstable_Long_CompactValue(), working with integer objects.

I'm not sure that it's worth deviating from the pattern by having this one function that's strongly typed.

I doubt this function has much sense alone. So, please consider other possible extensions of the public API, mentioned above (e.g. PyLong_DigitCount(), PyLong_AsDigits()).

encukou commented 2 months ago

Yep. Alas, using PyLongObject* only for new functions only would make things too inconsistent.

Are there no parts of the C API where we specify a function that takes a specific object type as argument and returns a value without possibility of error? [...]

Incidentally, I'm (slowly) collecting this kind of data for my Language Summit topic, which will be about adding the kind of API Mark wants in a consistent way. However, the info I have doesn't know about macros or functions only documented or implicitly understood to never fail, so my list is pretty small:

PyType_GetFlags, PyUnstable_Long_CompactValue
Plus there are ones that return void (PyType_Modified, PyFunction_SetVectorcall; perhaps Py_SET_SIZE counts) and ones that take a specific object in addition to PyObject* (Py_IS_TYPE, Py_SET_TYPE)

But we can also look at all functions that take a specific object. There aren't that many, and the vast majority of them deal with PyTypeObject, PyCodeObject, and PyFrameObject.

Click for a list

(From a Clang AST dump, ad-hoc filtering. Macros aren't included at all; no info about how wrongly cast arguments are handled.) ``` Py_IS_TYPE PyTypeObject * type Py_SET_TYPE PyTypeObject * type Py_SET_SIZE PyVarObject * ob PyType_GetSlot PyTypeObject * (param 1) PyType_GetModule PyTypeObject * (param 1) PyType_GetModuleState PyTypeObject * (param 1) PyType_GetName PyTypeObject * (param 1) PyType_GetQualName PyTypeObject * (param 1) PyType_GetFullyQualifiedName PyTypeObject * type PyType_GetModuleName PyTypeObject * type PyType_FromMetaclass PyTypeObject * (param 1) PyObject_GetTypeData PyTypeObject * cls PyType_GetTypeDataSize PyTypeObject * cls PyType_IsSubtype PyTypeObject * (param 1) PyTypeObject * (param 2) PyObject_TypeCheck PyTypeObject * type PyType_GetFlags PyTypeObject * (param 1) PyType_Ready PyTypeObject * (param 1) PyType_GenericAlloc PyTypeObject * (param 1) PyType_GenericNew PyTypeObject * (param 1) PyType_Modified PyTypeObject * (param 1) PyType_GetDict PyTypeObject * (param 1) PyUnstable_Type_AssignVersionTag PyTypeObject * type PyType_HasFeature PyTypeObject * type PyType_GetModuleByDef PyTypeObject * (param 1) PyObject_Init PyTypeObject * (param 2) PyObject_InitVar PyVarObject * (param 1) PyTypeObject * (param 2) PyType_SUPPORTS_WEAKREFS PyTypeObject * type PyUnstable_Object_GC_NewWithExtraData PyTypeObject * (param 1) PyUnstable_Long_IsCompact const PyLongObject * op PyUnstable_Long_CompactValue const PyLongObject * op PyCMethod_New PyTypeObject * (param 4) PyFunction_SetVectorcall PyFunctionObject * (param 1) PyCode_GetNumFree PyCodeObject * op PyUnstable_Code_GetFirstFree PyCodeObject * op PyCode_GetFirstFree PyCodeObject * op PyCode_Addr2Line PyCodeObject * (param 1) PyCode_Addr2Location PyCodeObject * (param 1) PyCode_GetCode PyCodeObject * code PyCode_GetVarnames PyCodeObject * code PyCode_GetCellvars PyCodeObject * code PyCode_GetFreevars PyCodeObject * code PyFrame_GetLineNumber PyFrameObject * (param 1) PyFrame_GetCode PyFrameObject * frame PyFrame_GetBack PyFrameObject * frame PyFrame_GetLocals PyFrameObject * frame PyFrame_GetGlobals PyFrameObject * frame PyFrame_GetBuiltins PyFrameObject * frame PyFrame_GetGenerator PyFrameObject * frame PyFrame_GetLasti PyFrameObject * frame PyFrame_GetVar PyFrameObject * frame PyFrame_GetVarString PyFrameObject * frame PyTraceBack_Here PyFrameObject * (param 1) PyGen_New PyFrameObject * (param 1) PyGen_NewWithQualName PyFrameObject * (param 1) PyGen_GetCode PyGenObject * gen PyCoro_New PyFrameObject * (param 1) PyAsyncGen_New PyFrameObject * (param 1) PyDescr_NewMethod PyTypeObject * (param 1) PyDescr_NewClassMethod PyTypeObject * (param 1) PyDescr_NewMember PyTypeObject * (param 1) PyDescr_NewGetSet PyTypeObject * (param 1) PyDescr_NewWrapper PyTypeObject * (param 1) PyStructSequence_InitType PyTypeObject * type PyStructSequence_InitType2 PyTypeObject * type PyStructSequence_New PyTypeObject * type PyModule_AddType PyTypeObject * type PyEval_EvalFrame PyFrameObject * (param 1) PyEval_EvalFrameEx PyFrameObject * f PyUnstable_PerfTrampoline_CompileCode PyCodeObject * (param 1) PyUnstable_Replace_Executor PyCodeObject * code PyUnstable_GetExecutor PyCodeObject * code ``` Counts of tokens in the above argument types: ``` Counter({'*': 73, 'PyTypeObject': 38, 'PyFrameObject': 17, 'PyCodeObject': 12, 'PyVarObject': 2, 'const': 2, 'PyLongObject': 2, 'PyFunctionObject': 1, 'PyGenObject': 1}) ``` These are functions starting with `Py`, where argument types include `Py` but not `PyObject` nor something from an ad-hoc list of non-PyObject types: ``` ['PyCompilerFlags', 'PyConfig', 'PyGetSetDef', 'PyInterpreterConfig', 'PyInterpreterState', 'PyMemAllocatorEx', 'PyMemberDef', 'PyMethodDef', 'PyModuleDef', 'PyObject', 'PyObjectArenaAllocator', 'PyPreConfig', 'PyStructSequence_Desc', 'PyThreadState', 'PyTime_t', 'PyType_Spec', 'PyWideStringList', 'Py_UCS4', 'Py_buffer', 'Py_ssize_t', 'Py_tss_t', '_PyExecutorObject', '_PyInterpreterFrame', '_PyOptimizerObject', '_Py_CODEUNIT'] ```

zooba commented 2 months ago

I think we can reasonably hypothesise (and perhaps Guido can confirm) that the PyTypeObject * functions were initially likely to deal with static instantiations of PyTypeObject, either exported from CPython or defined in users' own code. So those would require a (valid) cast to PyObject * for cases where they were to be treated as first-class objects, but for most type object operations are easier to reference directly.

Anything allocated or instantiated by CPython would be object a.k.a. PyObject *, and so the "real" object type would never be referenced. Like having to cast malloc's void * result to the type you want, callers would have to cast PyObject * to PyLongObject *. But the principle here is EAFP, and so just as an instance of int can be passed to any Python function, an instance of PyLongObject can be passed to any C API function, which will then take responsibility for deciding whether the caller deserves forgiveness or not.

I suspect the other functions taking more specific object types directly were either added without considering this principle or were being consistent with an earlier API that returned non-object structs. So it could have gone either way - for me, I'd have said if they are opaque to make them PyObject * in calls, and if they are meant to be used as structs then to remove the Object from the name and define their own lifetime semantics.

I doubt this function has much sense alone. So, please consider other possible extensions of the public API, mentioned above (e.g. PyLong_DigitCount(), PyLong_AsDigits()).

If we do go this way, we should also define PyLong_DigitSize to return the number of bits in each digit. The point of using a standard interchange format (bytes) is to hide the internal implementation details - if we decide to expose those details, then they should be runtime parameterised (so compact values can return a DigitCount()=1 and DigitSize()=8*sizeof(internal repr) to avoid the recalculation work it would take to put it into "normal" digits).

gvanrossum commented 2 months ago

I think we can reasonably hypothesise (and perhaps Guido can confirm) that the PyTypeObject * functions were initially likely to deal with static instantiations of PyTypeObject, either exported from CPython or defined in users' own code. [...]

Yes, exactly.

I suspect the other functions taking more specific object types directly were either added without considering this principle or were being consistent with an earlier API that returned non-object structs. So it could have gone either way - for me, I'd have said if they are opaque to make them PyObject * in calls, and if they are meant to be used as structs then to remove the Object from the name and define their own lifetime semantics.

More likely, the Code and Frame were originally created primarily for internal use. The core of the interpreter uses these a lot and often accesses the members directly, so it's more convenient to have the API functions take the actual object types. We're not very consistent in this, alas, and the interpreter code is still full of casts, but since we're feeling the need for speed, repeated dynamic type checks are just wasted time on the fast path. Eventually, we have evolved a parallel set of internal APIs, so we care less about the performance of the public APIs, but it's too late to change those.

Maybe for 3.14 we can sit down with Mark and someone from (e.g.) GMP and design a brand new API for PyLong that is statically typed and provide all the types of access that libraries like GMP need while allowing the appropriate amount of API evolution/stability from the perspective of likely future improvements to the PyLong implementation. (Though I suspect that the step function there will be tagged pointers, not changes in object representation.)

encukou commented 2 months ago

FWIW, I still prefer the original proposal to the 0/1/2. I find consistency in the error indicator (-1) very important; I also don't think avoiding the output parameter is worth messing with the traditional sign value (-1/0/+1).

Looking at the guidelines proposal I've been writing, I'd generally like to go for a certain strict, even mechanical consistency, plus sometimes adding an extra variant that trades that consistency for convenience or speed.

So, why not both?

int PyLong_Sign(PyObject *obj, int *result)

on success, returns 0 and set *result to -1/0/+1
on error, returns -1 with exception set. Also, as a convenience, sets *result to -2 to make it switchable

int PyLong_SignUnchecked(PyLongObject *obj)

returns -1/0/+1
does no type checking (just a debug assert); cannot signal an error

gvanrossum commented 2 months ago

Why not both? Because this use case isn't important enough to have two separate APIs with different signature and behavior.

I can live with having just the two-argument version if we rename it to PyLong_GetSign.

markshannon commented 2 months ago

If we do provide both, PyLong_SignUnchecked is the one that will get used. I'd be very surprised if users chose the more complex variant over the simpler one.

zooba commented 2 months ago

If we were to do both, the latter ought to be PyUnstable_Long_Sign (or PyUnstableLong_Sign? Or PyLongUnstable_Sign?) and probably be defined as an inline function (like the existing private function).

Otherwise, 100% agree with Guido (and 90% with Mark: I think the pointer cast will sometimes be less convenient than chaining conditions (if (...Sign(x, &s) && s > 0 && ...NativeBytes(...)), but for the most part the simpler function will win).

encukou commented 2 months ago

I can live with having just the two-argument version if we rename it to PyLong_GetSign.

Sounds great!

PyLong_SignUnchecked is the one that will get used. I'd be very surprised if users chose the more complex variant over the simpler one.

That's OK -- with a name that announces that this function is unusual. (And if we add it, I'd consider it inconsistent to not have the “regular” variant.)

PyUnstable_Long_Sign

I don't think we need to worry about API stability here. IMO, the convenient API is inconsistent with how we want new API to look. While consistency and stability are correlated, I'd rather keep them separate (Unchecked vs. Unstable).

encukou commented 1 month ago

int PyLong_GetSign(PyObject *obj, int *sign)

On success, set *sign to sign of integer object obj (0, -1 or +1 for zero, negative or positive integer, respectively) and return 0.

On failure, return -1 with an exception set. This function always succeeds if obj is a PyLongObject or its subtype.

[x] @gvanrossum
[x] @encukou (+1)
[x] @vstinner
[x] @zooba
[x] @erlend-aasland

markshannon commented 1 month ago

Have any of the likely users asked for this two argument form, as opposed to int PyLong_GetSign(PyLongObject *obj)?

zooba commented 1 month ago

It hardly matters, they aren't getting one that accepts PyLongObject as an argument. The choice will be between putting the error code or the result in an output parameter, and previously we've agreed (in a very rare case of agreement 😄 ) that the error code should be the returned value.

encukou commented 1 month ago

@iritkatriel, what are your thoughts?

vstinner commented 4 weeks ago

@erlend-aasland: Are you ok with API proposed in https://github.com/capi-workgroup/decisions/issues/19#issuecomment-2088230212 ? Tell me if you need more context/information. And it's ok if you need more time to decide.

erlend-aasland commented 3 weeks ago

I'm fine with the proposed API; I voted.

vstinner commented 3 weeks ago

Thanks all. The SC adopted int PyLong_GetSign(PyObject *obj, int *sign) API: https://github.com/capi-workgroup/decisions/issues/19#issuecomment-2088230212. I close the issue.

serhiy-storchaka commented 3 weeks ago

I concur with @markshannon. For such low-level function the overhead of checking the type is too high, and it makes the API less convenient.

Also, I think that PyLong_IsPositive(), PyLong_IsNegative() and PyLong_IsZero() are more useful in practice and adds less overhead. So I propose to implement these functions first, and then look if there are enough use cases are left for PyLong_Sign().

These functions are used when performance is very important. When it is not important, you can use PyObject_RichCompareBool().

vstinner commented 3 weeks ago

These functions are used when performance is very important.

In which kind of code the performance of PyLong_GetSign() would matter? I don't expect this function to be called often, like not more than once per integer, and not in "hot code".

PyLong_GetSign() cannot fail if the argument is a Python int. If you know that the parameter is a Python int, you can simply ignore the error handling (maybe using an assertion, just in case).

serhiy-storchaka commented 3 weeks ago

I do not remember details, it was many years ago, but I used Py_SIZE(x) < 0 instead of _PyLong_Sign(x) < 0 more than once in the past because the difference was noticeable. You can search all occurrences of "_PyLong_Is" and try to replace them with _PyLong_Sign.

BTW, the _PyLong_Is*() functions are used much more (104 occurrences total: 26 _PyLong_IsZero, 7 _PyLong_IsPositive and 71 _PyLong_IsNegative) than _PyLong_Sign() (8). And several of these _PyLong_Sign() calls could be replaced with _PyLong_Is*().

The API proposed here is not only slower, it is less convenient. You need to introduce a variable, and even if you ignore error, you cannot use it in expression.

int sign;
(void)PyLong_Sign(obj, &sign);
if (sign < 0) {

instead of

if (PyLong_IsNegative(obj)) {

Multiply this by 112 use cases.

vstinner commented 3 weeks ago

The API proposed here is not only slower, it is less convenient.

These aspects were taken in account in the decision.

This API takes PyObject*. If you consider that it's way too slow and the API is not convenient, you can propose adding a PyUnstable API which takes PyLongObject*. I'm not convinced that it's needed.

Also, this issue is not closed, so I suggest to open a new one.

capi-workgroup / decisions

Add PyLong_Sign() public function #19