Closed vstinner closed 1 month ago
To play with:
Later I'll show some simple benchmarks, but first please note that current code for gmpy2 do special cases for "small" integers to make performance acceptable. I think it's a common pattern and should be used in the second case as well (current code in the CPython main has a special case for 1-digit ints).
That's why I would prefer more simple and less generic API, like proposed in #31. E.g. for reading:
if (PyLong_IsCompact(obj)) { /* should be a quick check */
mpz_set_si(z, PyLong_CompactValue(obj)); /* compact value fits in some integer type */
}
else {
/* fallback to "array of digits" view for abs(obj) */
const Py_digit *digits = PyLong_GetDigits(obj); /* can't fail for non-compact ints */
mpz_import(z, ..., digits);
if (PyLong_IsNegative(obj) { /* looks better than PyLong_GetSign() ;) */
mpz_neg(z, z);
}
}
(In this example, for simplicity, I assume that compact value fits in long.)
On the master:
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<7' 'mpz(x)'
Mean +- std dev: 233 ns +- 3 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<38' 'mpz(x)'
Mean +- std dev: 295 ns +- 2 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<300' 'mpz(x)'
Mean +- std dev: 491 ns +- 3 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<3000' 'mpz(x)'
Mean +- std dev: 2.33 us +- 0.01 us
With new API:
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<7' 'mpz(x)'
Mean +- std dev: 263 ns +- 37 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<38' 'mpz(x)'
Mean +- std dev: 315 ns +- 1 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<300' 'mpz(x)'
Mean +- std dev: 508 ns +- 3 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<3000' 'mpz(x)'
Mean +- std dev: 2.34 us +- 0.01 us
Without special case for 1-digit:
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<7' 'mpz(x)'
Mean +- std dev: 291 ns +- 2 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<38' 'mpz(x)'
Mean +- std dev: 316 ns +- 1 ns
Using PyUnstable_Long_IsCompact (like in proposed above code):
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<7' 'mpz(x)'
Mean +- std dev: 215 ns +- 1 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<38' 'mpz(x)'
Mean +- std dev: 317 ns +- 2 ns
New API overhead seems to be noticeable for 1-2 digits case (up to 10-15%).
On the master:
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=mpz(1<<7)' 'int(x)'
Mean +- std dev: 111 ns +- 1 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=mpz(1<<38)' 'int(x)'
Mean +- std dev: 155 ns +- 1 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=mpz(1<<300)' 'int(x)'
Mean +- std dev: 473 ns +- 6 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=mpz(1<<3000)' 'int(x)'
Mean +- std dev: 2.20 us +- 0.00 us
With new API:
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=mpz(1<<7)' 'int(x)'
Mean +- std dev: 111 ns +- 1 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=mpz(1<<38)' 'int(x)'
Mean +- std dev: 155 ns +- 2 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=mpz(1<<300)' 'int(x)'
Mean +- std dev: 512 ns +- 5 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=mpz(1<<3000)' 'int(x)'
Mean +- std dev: 2.25 us +- 0.02 us
If PyLong_GetDigits
returned the layout information with its call, then it could return ndigits=sizeof(ssize_t); bits_per_digit=8; digits=&compact_value
for a compact value, and you shouldn't need the additional checks. The layout could be a pointer to a static struct easily enough, so it's not costly to return, though I prefer the stability of returning a pointer that requires a "free" call of some sort.
Otherwise, if we were to document this function as pairing with PyUnstable_Long_IsCompact
then it probably should also be unstable. But I think it's better to leave that optimisation unspecified, if it's not needed for correctness.
PyAPI_DATA(const PyLongLayout) PyLong_LAYOUT;
Please, no more exported data. Make it a function that either returns a pointer to the layout, a function that copies the layout values into a struct, or integrate it with GetDigits
as I suggested so that we can support multiple layouts.
// Array endian:
// * 1 for most significant byte first (big endian)
// * -1 for least significant first (little endian)
int8_t array_endian;
Does this meant "most significant digit first"? Or "most significant byte of each digit first"? Or both?
As we'd now use Py_digit
in the ABI, is there a need to also provide its size via a function? Or can we rely on callers understanding how their C ABI works? (Or can we drop Py_digit
from the ABI and just work with void *
instead? In which case we must provide the digit size some other way.)
Does this meant "most significant digit first"? Or "most significant byte of each digit first"? Or both?
No, it's about endianness in the digit. We have word_endian
to represent the order of digits.
Probably naming could be better. E.g. endian
instead of array_endian
and/or digits_order
instead of word_endian
. It seems, CPython consistently uses term "digit" instead of "word" (as libmpdec) or "limb" (as GMP).
Or can we drop Py_digit from the ABI and just work with void * instead?
That seems to be a good idea. PyLongLayout
has digit_size
field.
I agree with @zooba, and I still stand by my earlier comment. I don't want to lock CPython -- and any future C API implementors -- into using a single layout for all ints.
Do you mean adding a layout member to PyLong_DigitArray (export) and a layout parameter to PyLongWriter_Create()
(import)?
typedef struct PyLong_DigitArray {
...
const PyLongLayout *layout; // <== NEW
} PyLong_DigitArray;
PyAPI_FUNC(int) PyLong_AsDigitArray(
PyObject *obj,
PyLong_DigitArray *array);
PyAPI_FUNC(PyLongWriter*) PyLongWriter_Create(
...
const PyLongLayout *layout); // <== NEW
I don't want to lock CPython -- and any future C API implementors -- into using a single layout for all ints.
JFR, I don't think this locks CPython in any way: this single layout is an asymptotic one (for "big" ints). The API has enough freedom to emulate this view for small integers.
Suggestion @zooba might work with API like (reading case):
typedef struct PyLongLayout {
uint8_t bits_per_digit;
uint8_t digit_sizeof;
int8_t digits_order;
int8_t digit_endianness;
Py_ssize_t ndigits;
} PyLongLayout;
/* Returns a pointer to digit array (or NULL, if obj not PyLongObject) and
set layout. Which might be different wrt bits_per_digit and digit_sizeof for
"small" and "big" ints. */
PyAPI_FUNC(const void *) PyLong_AsDigits(PyObject *obj, PyLongLayout *layout);
The API has enough freedom to emulate this view for small integers.
Currently, exporting a small integer as an array of digits is a O(1) operation. There is no need to allocate memory or anything, all integers are implemented internally as an array of digits.
OK. Let's design API for what we have. If we need additional layouts, that kind of API can be added in the future, and if/when we add it, the API added here will start copying data. And if a non-CPython implementation uses a more elaborate scheme, it'll need to copy data too. I could live with that. Let's design it that way then.
PyLongLayout
should be named PyLong_DigitArrayLayout
, since the actual internal layout might change in the future.
Same with the writer: PyLong_DigitArrayWriter_Create
& PyLong_DigitArrayWriter_Finish
.
We also need a PyLong_DigitArrayWriter_Discard
. Finish
is trivial now, but if it can do a memory copy in the future, users should call Discard
in error paths.
We don't need endianness in the struct. I think something like a new medium-int representation (which would need a new API), is much more likely than switching to non-native endianness. (Endian flags make sense if you want to serialize data, and can ask for a particular format, but the in-memory format can just always be native.)
As for avoiding Py_DATA
, I filed https://github.com/capi-workgroup/problems/issues/80.
I'm OK with a PyLong_GetDigitArrayLayout
function rather than a PyLong_LAYOUT
constant (or maybe in addition to a PyLong_DIGIT_ARRAY_LAYOUT
constant?) but it's not clear to me why it's needed.
@skirpichev said:
I would prefer more simple and less generic API, like proposed in https://github.com/capi-workgroup/decisions/issues/31. E.g. for reading:
if (PyLong_IsCompact(obj)) { /* should be a quick check */ mpz_set_si(z, PyLong_CompactValue(obj)); /* compact value fits in some integer type */ } else { /* fallback to "array of digits" view for abs(obj) */ const Py_digit *digits = PyLong_GetDigits(obj); /* can't fail for non-compact ints */ mpz_import(z, ..., digits); if (PyLong_IsNegative(obj) { /* looks better than PyLong_GetSign() ;) */ mpz_neg(z, z); } }
Would it work if you replace PyLong_IsCompact
and PyLong_CompactValue
with PyLong_AsNativeBytes
and a digit_size
buffer? That will do extra work for long ints, but the amount of extra work might be small enough (and offset by only calling one function).
If it's not fast enough, maybe we should add a Py_ASNATIVEBYTES_COMPACT_ONLY
flag, which would make PyLong_AsNativeBytes
leave buffer untouched and return 0 when you should use PyLong_AsDigitArray
.
Looking at the above example: most callers will need the sign bit, so PyLong_AsDigitArray
should return 1 for negative numbers.
(Or it could get a int *sign
output argument like with PyLong_GetSign
, but distinguishing 0 from positives doesn't sound too useful here.)
We don't need endianness in the struct.
You meant word_endian too?
Would it work if you replace PyLong_IsCompact and PyLong_CompactValue with PyLong_AsNativeBytes and a digit_size buffer?
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<7' 'mpz(x)'
Mean +- std dev: 263 ns +- 2 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<38' 'mpz(x)' # huh
Mean +- std dev: 350 ns +- 2 ns
Looking at the above example: most callers will need the sign bit
Not necessary. This view rather represents the absolute value.
You meant word_endian too?
Yes. I see no reason to single that out as a detail that might change.
Mean +- std dev: 263 ns +- 2 n Mean +- std dev: 350 ns +- 2 ns
Hm, I guess that's not good enough?
Does the Py_ASNATIVEBYTES_COMPACT_ONLY
idea sound reasonable?
most callers will need the sign bit
I meant that instead of
int result = PyLong_AsDigitArray(obj, &long_export);
if (result < 0) goto error;
mpz_import(...);
if (long_export.negative) {
mpz_neg(z, z);
}
PyLong_FreeDigitArray(&long_export);
you'd do
int sign = PyLong_AsDigitArray(obj, &long_export);
if (sign < 0) goto error;
mpz_import(...);
if (sign) {
mpz_neg(z, z);
}
PyLong_FreeDigitArray(&long_export);
Hm, I guess that's not good enough?
Yes, especially the second case. You can see other benchmarks above.
Does the Py_ASNATIVEBYTES_COMPACT_ONLY idea sound reasonable?
Yes, if PyLong_AsNativeBytes can do a quick exit - that should be an alternative to PyLong_IsCompact/CompactValue.
int sign = PyLong_AsDigitArray(obj, &long_export);
This looks as a minor convenience for me. I think to query (optionally) sign people should use dedicated functions (like PyLong_IsNegative()).
Oh, I forgot we already have PyUnstable_Long_IsCompact
exposed. That's the check a Py_ASNATIVEBYTES_COMPACT_ONLY
would do in the current implementation.
If you're chasing nanoseconds, I don't think you want to compile for stable ABI, and that you will want to change your code if the internals change. So, PyUnstable
looks OK.
(That said, consider having the optimizations in #ifndef Py_LIMITED_API
, and releasing stable-ABI wheels alongside the version-specific fast ones. Testers of CPython alpha releases would thank you, and hopefully, eventually, so would PyPy users.)
@skirpichev:
Probably naming could be better. E.g. endian instead of array_endian and/or digits_order instead of word_endian. It seems, CPython consistently uses term "digit" instead of "word" (as libmpdec) or "limb" (as GMP).
I renamed word_endian
to digits_order
and array_endian
to endian
.
@encukou:
I agree with @zooba, and I still stand by my https://github.com/capi-workgroup/decisions/issues/31#issuecomment-2188269464. I don't want to lock CPython -- and any future C API implementors -- into using a single layout for all ints.
Ok, I modified the API to add layout
parameters to import and export APIs.
PyLong_LAYOUT
constant with PyLong_GetNativeLayout()
function.layout
parameter to PyLongWriter_Create()
.layout
member to PyLong_DigitArray
structure.@skirpichev:
This looks as a minor convenience for me. I think to query (optionally) sign people should use dedicated functions (like PyLong_IsNegative()).
The function PyLong_IsNegative()
doesn't exist. Or maybe you're thinking at a different function?
The function PyLong_IsNegative() doesn't exist.
Yes, but see https://github.com/capi-workgroup/decisions/issues/29
Mean +- std dev: 263 ns +- 2 n Mean +- std dev: 350 ns +- 2 ns
Hm, I guess that's not good enough?
If I'm reading the earlier benchmarks correctly, then we'd have to beat 111ns and 155ns for these, which is clearly only going to be possible by leaking access to Long internals. So it's fine for when using unstable APIs, and the slower path is probably fine for stable callers (it's still faster than failing ;) ). The difference is largely unavoidable function call overhead, a type check, and the (potentially unaligned) memcpy. I daresay we could add a faster path through AsNativeBytes
for an aligned destination of sizeof(Py_ssize_t)
, but it's still unlikely to approach the speed of the macros.
But that's okay, blazing speed is what the unstable API is for, and those who need it can at least opt into using it at their own cost.
I do think that anything returning somewhat raw digits from a LongObject should also return the sign. One call to get all the info to accurately recreate the value, and you only need additional checks if you're creating your own fast path.
(Or can we drop Py_digit from the ABI and just work with void * instead? In which case we must provide the digit size some other way.)
Ok, done.
We also need a PyLong_DigitArrayWriter_Discard. Finish is trivial now, but if it can do a memory copy in the future, users should call Discard in error paths.
I would prefer to avoid adding such function for now, we can add it later if needed. What do you think?
@skirpichev @zooba @encukou: I updated the API. What do you think? Are we good yet?
Is there any impact going to come out of the discussions of setting a hard (if theoretical) limit on the length?
Is there any impact going to come out of the discussions of setting a hard (if theoretical) limit on the length?
@serhiy-storchaka may know about that :-) This API is limited by Py_ssize_t ndigits
. If the number of digits fits into a signed size_t, we are good.
The PR https://github.com/python/cpython/pull/123498 defines # define MAX_LONG_DIGITS ((UINT64_MAX-1) / (uint64_t)PyLong_SHIFT)
which is unsigned. If we want to be compatible with gh-123498 maximum, the API should be modified to use unsigned digits number. But well, INT64_MAX/PyLong_SHIFT
(signed) digits is already is a very big number :-)
I would prefer to avoid adding such function for now, we can add it later if needed. What do you think?
So, to destroy the writer (e.g. in an error handling path), the user should call PyLongWriter_Finish
and XDECREF the result?
That works, but I'd much prefer writers to consistently and have both Finish
and Discard
.
If I'm reading the earlier benchmarks correctly, then we'd have to beat 111ns and 155ns for these
Sorry for a late answer @zooba, but no. Those timings are for writing API (mpz->int). Above benchmarks are for reading (int->mpz). I'll repeat them in one place:
On the master:
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<7' 'mpz(x)'
Mean +- std dev: 233 ns +- 3 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<38' 'mpz(x)'
Mean +- std dev: 295 ns +- 2 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<300' 'mpz(x)'
Mean +- std dev: 491 ns +- 3 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<3000' 'mpz(x)'
Mean +- std dev: 2.33 us +- 0.01 us
With new API (see https://github.com/aleaxit/gmpy/pull/495):
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<7' 'mpz(x)'
Mean +- std dev: 263 ns +- 37 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<38' 'mpz(x)'
Mean +- std dev: 315 ns +- 1 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<300' 'mpz(x)'
Mean +- std dev: 508 ns +- 3 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<3000' 'mpz(x)'
Mean +- std dev: 2.34 us +- 0.01 us
Without special case for 1-digit:
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<7' 'mpz(x)'
Mean +- std dev: 291 ns +- 2 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<38' 'mpz(x)'
Mean +- std dev: 316 ns +- 1 ns
Using PyUnstable_Long_IsCompact (like in proposed above code):
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<7' 'mpz(x)'
Mean +- std dev: 215 ns +- 1 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<38' 'mpz(x)'
Mean +- std dev: 317 ns +- 2 ns
"if you replace PyLong_IsCompact and PyLong_CompactValue with PyLong_AsNativeBytes and a digit_size buffer":
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<7' 'mpz(x)'
Mean +- std dev: 263 ns +- 2 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<38' 'mpz(x)' # huh
Mean +- std dev: 350 ns +- 2 ns
So, using a quick check (currently with PyUnstable_Long_IsCompact) to fail on special functions for small integers, else use PyLong_GetDigits() (only one new function for reading digits) - has better performance then current API.
Maybe you are right and we should introduce a bunch of simple PyUnstable_* functions instead of this more complicated API, which has performance regressions.
clearly only going to be possible by leaking access to Long internals
I doubt that using single layout for "big enough" integers leaks something important. All MP libraries share that detail, hardly CPython will break this in a future.
I updated the API. What do you think? Are we good yet?
I like changes.
The PR https://github.com/python/cpython/pull/123498 defines # define MAX_LONG_DIGITS ((UINT64_MAX-1) / (uint64_t)PyLong_SHIFT) which is unsigned. If we want to be compatible with gh-123498 maximum, the API should be modified to use unsigned digits number.
I'm wrong. MAX_LONG_DIGITS fits into signed size_t
(Py_ssize_t
), since it's a number of digits, not a number of bits: / (uint64_t)PyLong_SHIFT
makes MAX_LONG_DIGITS
smaller than PY_SSIZE_T_MAX
.
@encukou:
So, to destroy the writer (e.g. in an error handling path), the user should call PyLongWriter_Finish and XDECREF the result?
You can prepare your data in advance before creating a writer, so later, your code cannot fail.
That works, but I'd much prefer writers to consistently and have both Finish and Discard.
Ok, I added a PyLongWriter_Discard()
function.
I open a vote on this API:
I would appreciate @casevh opinion, but so far this API has severe performance degradation for reading small integers (int->mpz conversion), up to 25%. Hardly it's acceptable. So, probably projects like gmpy2, sage or flint-python - will use some other code path for such integers.
I'm wrong. MAX_LONG_DIGITS fits into signed size_t (Py_ssize_t), since it's a number of digits, not a number of bits
Yet maybe we should use size_t here, like GMP? Signed integer type looks odd in this context.
I doubt that using single layout for "big enough" integers leaks something important. All MP libraries share that detail, hardly CPython will break this in a future.
We want to preserve code that has already been compiled, which means if we change the size of the digits but we already leaked that into other compiled code (i.e. hardcoded from a macro), then we've broken it. The whole point of the Unstable
APIs are to give us more opportunities to make these kinds of changes - otherwise we'd be looking at 3-5 years/releases if we decide that a bigger/smaller digit size is better for everyone.
The stable digit size is 8 bits, and personally I'm quite okay with the performance hit for that stability. But if we are also to guarantee a second digit size with the same stability then we either risk adding a performance hit later (if we now have to start converting digits) or get stuck never gaining anything because we aren't able to change the internal representation. So the best you're going to get here is "fast+unstable (recompile each release)" or "fast+dynamic digit size".
I open a vote on this API:
Only thing I want to change here is the const PyLongLayout *layout
argument to PyLongWriter_Create
. I'm pretty sure this is a YAGNI situation, and we should just say "call PyLong_GetNativeLayout
to get the layout info you need for PyLongWriter_Create
" and not allow passing the info in (I certainly don't want to maintain the conversion code - that can go live in gmp and friends with the more complex export functions). I'm happy enough with the rest, though will defer the "is it satisfactory" question to @skirpichev.
My apologies for not following this thread as much as I would have liked. I'll also defer to @skirpichev for the "is it satisfactory" question.
I started optimizing gmpy around version 1.10. The single biggest win was using mpz_import/mpz_export directly into the PyLong internals. The only change to the code was support for 30-bit digits. The same code worked from Python 2.5 through Python 3.11. But this did require version specific builds.
The second win was using the PyLong_AsLongAndOverflow (and its variants) functions for conversion of Python ints into temporary C ints for direct use in many of the GMP function calls. Optimizing the performance of (mpz * 3) + 1 was more important than optimizing mpz(3) and mpz(1) because most of my code did mixed arithmetic. Note: the performance win was primarily from avoiding the creation of an exception.
As long as access to an unstable interface remains, I accept that the stable interface may be slightly slower. If the difference is significant, we will need to release stable and unstable versions.
Thanks for all the effort.
if we change the size of the digits but we already leaked that into other compiled code (i.e. hardcoded from a macro), then we've broken it.
Ah, PyLong_GetDigits() declaration has Py_digit*
... Well, this can be void*
. To properly interpret this output - you need something like PyLong_GetNativeLayout(). I think this saves freedom to change digit size just as the API, proposed by Victor.
Only thing I want to change here is the const PyLongLayout *layout argument to PyLongWriter_Create.
+1
I'm also not sure if we should keep layout field in the PyLong_DigitArray struct. What else it could be if not native layout?!
BTW, all required information about native layout is already available via PyLong_GetInfo(). PyLongLayout has additionally endian
and digits_order
fields. But I doubt those will be changed in future. So, PyLong_GetInfo() could essentially replace PyLong_GetNativeLayout().
I accept that the stable interface may be slightly slower. If the difference is significant, we will need to release stable and unstable versions.
@casevh, here are some benchmarks. Does this look acceptable for you?
@casevh, https://github.com/capi-workgroup/decisions/issues/35#issuecomment-2328216925 are some benchmarks. Does this look acceptable for you?
Have you tried PyLong_AsLongAndOverflow instead of PyUnstable_Long_IsCompact? It might not be as fast but it is stable (I think).
Have you tried PyLong_AsLongAndOverflow instead of PyUnstable_Long_IsCompact?
Yes, I did, just not included this case in above benchmarks. It's good for 1-2 digits and for many digits, but for ~10 - performance has negative impact.
Benchmark results for reading (now my pr uses PyLong_AsLongAndOverflow): | Benchmark | master | new-api |
---|---|---|---|
1<<7 | 263 ns | 268 ns: 1.02x slower | |
1<<38 | 317 ns | 276 ns: 1.15x faster | |
1<<300 | 514 ns | 588 ns: 1.14x slower | |
1<<3000 | 2.35 us | 2.42 us: 1.03x slower | |
Geometric mean | (ref) | 1.01x slower |
Hmm, maybe it's a good tradeoff.
JFR, benchmark results for writing: | Benchmark | master | new-api |
---|---|---|---|
1<<38 | 213 ns | 210 ns: 1.02x faster | |
1<<300 | 535 ns | 575 ns: 1.07x slower | |
1<<3000 | 2.27 us | 2.32 us: 1.02x slower | |
Geometric mean | (ref) | 1.02x slower |
Benchmark hidden because not significant (1): 1<<7
Only thing I want to change here is the const PyLongLayout *layout argument to PyLongWriter_Create. I'm pretty sure this is a YAGNI situation
Yes, later we can add PyLongWriter_CreateWithLayout
with an extra argument.
Ok, I updated the API:
layout
parameter of PyLongWriter_Create()
.layout
member of PyLong_DigitArray
structure.unsigned
type (size_t
) for ndigits.@skirpichev: IMO it's fine to rely on PyUnstable_Long_IsCompact()
for best performance. Even if it has "unstable" in its name, I expect the API to not change next releases, even if the implementation might change in the future. Well, or you can use PyLong_AsLongAndOverflow()
, as you want.
Use an
unsigned
type (size_t
) for ndigits.
AFAIK, Python uses Py_ssize_t
for counting bytes in nearly all existing APIs. I think we should continue doing that, even if GMP uses size_t
.
@encukou:
AFAIK, Python uses Py_ssize_t for counting bytes in nearly all existing APIs. I think we should continue doing that, even if GMP uses size_t.
@skirpichev:
Yet maybe we should use size_t here, like GMP? Signed integer type looks odd in this context.
@skirpichev: are you ok to use signed Py_ssize_t?
are you ok to use signed Py_ssize_t?
This still look odd for me, but I agree with Petr - Py_ssize_t seems to be more consistent with the CPython API. size_t used in *alloc() functions.
Ok, I updated the API again to move back to signed Py_ssize_t
.
@skirpichev: Are you ok with the latest API?
@vstinner, yes. Some minor nitpicks were mentioned above, feel free to ignore:
(PyLong_GetInfo() is slow, but it's not a problem, as it might be called just once at module initialization.)
@vstinner:
@skirpichev: Are you ok with the latest API?
@skirpichev:
@vstinner, yes
Ah, good :-)
@skirpichev:
BTW, all required information about native layout is already available via PyLong_GetInfo(). PyLongLayout has additionally endian and digits_order fields. But I doubt those will be changed in future. So, PyLong_GetInfo() could essentially replace PyLong_GetNativeLayout().
The purpose of this API is to stop relying on Python internals, but instead offer an API which expose all information needed by these features (import/export). Maybe CPython internals will not change, but maybe PyPy uses a different layout and can decide to change its layout between two PyPy versions. MicroPython, GraalPython, RustPython and others can also use a different layout. IMO it's worth it exposing "digits_order" and "endian". Also, a structure is more convenient than a namedtuple in C.
Well, an alternative would be to complete sys.int_info
to add digits_order
and endian
, but I don't know if it makes sense to expose these info in Python.
For reference, sys.int_info doc: https://docs.python.org/dev/library/sys.html#sys.int_info
maybe PyPy uses a different layout
Certainly, no. And I would be surprised if some bigint library (including Python implementations) using something different here. Can anyone point on examples?
Well, an alternative would be to complete sys.int_info to add digits_order and endian,
Or extend only PyLong_GetInfo() output.
I would be surprised if some bigint library (including Python implementations) using something different here. Can anyone point on examples?
You'd be surprised if there's a library out there that doesn't use 15-bit (or 30-bit) digits? I'd honestly be surprised if there was a bigint library that used those sizes, at least by default. Do they really all do it that way?
You'd be surprised if there's a library out there that doesn't use 15-bit (or 30-bit) digits?
No, it's fine: GMP is an example (that's why digit_size & bits_per_digit - really useful information). But GMP has same behaviour wrt digits_order and endian parameters. Is there a bigint library, where least significant "digit" ("limb", whatever) - stored last?
Ah, you just meant the digit order.
On one hand, they may not exist and we may not imagine a reason why they would, but on the other hand, that's why we didn't have this function in the first place. Better to include all the options in there, just in case, since we've already failed to predict the future once 😉
It became difficult for me to follow this discussion and navigate between messages since [the PR] has 119 comments and this issue has 47 comments, and there are a few more related issues. So I decided to write down PEP 757 to summarize the discussion: https://github.com/python/peps/pull/3961. It might be easier to make a decision on a PEP.
I wrote PEP 757 – C API to import-export Python integers, it's now online! I announced the PEP on discuss.python.org as well.
There is an open questions (by @skirpichev):
sys.int_info
and remove PyLong_GetNativeLayout()
? The PyLong_GetNativeLayout()
function returns a C structure which is more convenient to use in C than sys.int_info
which uses Python objects.I like PyLong_GetNativeLayout()
and I prefer to keep it. I don't think that it's redundant since it uses C types which are more convenient to use in C. And I'm not sure about adding digits_order and endian members to sys.int_info
: in Python, there are not useful information.
@vstinner, now description is outdated wrt the current proposal. Maybe you can sync?
@vstinner, now description is outdated wrt the current proposal. Maybe you can sync?
PEP 757 now contains all details. So I just replaced the first message with a reference to PEP 757.
See PEP 757 – C API to import-export Python integers.