capi-workgroup / decisions

Discussion and voting on specific issues
4 stars 1 forks source link

Add public function PyLong_GetDigits() #31

Open skirpichev opened 1 week ago

skirpichev commented 1 week ago

That will allow fast import of CPython integers if external multiple precision library support mpz_import()-like interface (e.g. LibTomMath has mp_unpack()). Currently gmpy2 (see this or Sage (see this) use private API to do this. Using new PyLong_FromNativeBytes() / PyLong_AsNativeBytes() will be a performance regression for such projects.

Proposed interface:

/* Access the integer value as an array of digits.
   On success, return array of digits and set *ndigits to number of digits.
   On failure, return NULL with an exception set.
   This function always succeeds if obj is a PyLongObject or its subtype.
 */
const digits* PyLong_GetDigits(PyObject* obj, Py_ssize_t *ndigits);

API usage:

static void
mpz_set_PyLong(mpz_t z, PyObject *obj)
{
    Py_ssize_t len;
    const digit *digits = PyLong_GetDigits(obj, &len);

    switch (len) {
    case 1:
        mpz_set_si(z, (sdigit)digits[0]);
        break;
    case 0:
        mpz_set_si(z, 0);
        break;
    default:
        mpz_import(z, len, -1, sizeof(digit), 0,
                   sizeof(digit)*8 - PYLONG_BITS_IN_DIGIT, digits);
    }

    int sign = 1;
    PyLong_GetSign(obj, &sign);
    if (sign < 0) {
        mpz_neg(z, z);
    }
    return;
}

Above interface resembles combined GMP's functions mpz_size() and mpz_limbs_read(). It might worth to consider also export from external multiple precision libraries, that offer mpz_export()-like API. gmpy2 does this using private API to access digits/digit count and also the _PyLong_New() to allocate an integer of appropriate size.

So, alternative interface to support both reading and writing might look like this:

/* Return number of digits or -1 on failure.  Shouldn't fail on int's. */
Py_ssize_t PyLong_DigitCount(PyObject *obj);
/* Return array of digits or NULL on failure. Shouldn't fail on int's. */
digit* PyLong_GetDigits(PyObject *obj);
/* Return a new integer object with unspecified absolute value of given
   size (in digits) and with a given sign.  On failure return NULL */
PyObject* PyLong_New(const Py_ssize_t ndigits, const int sign);

The PyLong_New() shouldn't violate #56: freshly created integer will be a correct one, just with an unspecified absolute value.

vstinner commented 1 week ago

I suggest a different API: const digits* PyUnstable_PyLong_GetDigits(PyLongObject* obj, Py_ssize_t *ndigits).

encukou commented 1 week ago

This assumes some implementation details, which we could not change in the future:

skirpichev commented 1 week ago

I suggest a different API: const digits PyUnstable_PyLong_GetDigits(PyLongObject obj, Py_ssize_t *ndigits).

With that on the table, please consider also second variant (de-facto two functions already exist as private):

/* Return number of digits. */
Py_ssize_t PyUnstable_PyLong_DigitCount(PyLongObject *obj);
/* Return array of digits. */
digit* PyUnstable_PyLong_GetDigits(PyLongObject *obj);
/* Return a new integer object with unspecified absolute value of given
   size (in digits) and with a given sign.  On failure return NULL. */
PyLong_Object* PyUnstable_PyLong_New(const Py_ssize_t ndigits, const int sign);

But that looks as a half-solution for me.

This assumes some implementation details, which we could not change in the future

That's true. On another hand, I'm not sure how essential freedom to change these details. The "big integer" - is just an array of "digits" ("limbs" in the GMP terminology). That seems to be a common denominator for all implementations of arbitrary precision integer arithmetic.

Also, I would expect that most future enhancements for CPython int's will be around "small" integers, that fit into some machine-size integer. (I.e. not implementation of new algorithms for arithmetic with better asymptotic properties (beyond Karatsuba). For the later - I would prefer a switch to some production quality library, like GMP.) We could mention, that for "small" integers it's better to use dedicated PyLong_From*() / PyLong_As*() functions.

The digit buffer type (size of a digit, number of significant bits, digit order, byte order within a digit) is fixed -- it would become a part of the ABI.

Yes, this is assumed in this proposal.

But we can avoid this by offering some function, that will provide such data. E.g. (like https://github.com/python/cpython/issues/102471#issuecomment-1620352561):

typedef struct _PyIntExportLayout {
     uint8_t bits_per_digit,
     int8_t word_endian,
     int8_t array_endian,
     uint8_t digit_size,
} PyIntExportLayout;

void Py_GetIntLayout(PyIntExportLayout *layout);

The mpz_import() (or mpz_export() if we consider the second variant of the proposal) - will do the rest.

encukou commented 1 week ago

The "big integer" - is just an array of "digits" ("limbs" in the GMP terminology).

Yes. But not all integers are big. For compact ints larger than a digit (e.g. 32 bits set in a native int, with 30-bit digits), the proposed API can't work.

[fixed structure] is assumed in this proposal. [...] But we can avoid this by offering some function, that will provide such data. E.g. Py_GetIntLayout

This still assumes that all ints share the same layout.

Also, I would expect that most future enhancements for CPython int's will be around "small" integers, that fit into some machine-size integer. (I.e. not implementation of new algorithms for arithmetic with better asymptotic properties (beyond Karatsuba). For the later - I would prefer a switch to some production quality library, like GMP.)

This assumes that CPython (or another implementation that provides the C API) won't ever switch to a production-quality bigint library.


Conisder something similar to what I proposed for str export:

If the user guesses the underlying data format correctly, they get a zero-copy shared buffer, and “release” is a flag check + decref. If the guess is wrong, they get a newly allocated buffer -- slower but still correct. An actively maintained library would no doubt follow the CPython internals, but things would work (with only a slowdown):

The downside is, of course, that it's harder to implement on CPython's side.

zooba commented 1 week ago

I'd prefer an API shaped like PyLong_GetDigits(PyObject *v, void *buffer, size_t *count, size_t *bits_per_digit) (or possibly use the buffer interface, which I think can cover the same thing?), with a detectable "not supported" result that means the caller should use another conversion (i.e. for small ints).

Our digits don't currently fill a multiple of 8 bits, which means we would need to repack them to provide a contiguous buffer. At that point, AsNativeBytes is fine, so if there's a need to not do that, we need to return enough information to read the digits out of a non-contiguous buffer.

But the critical things for us are that we shouldn't do any conversions from the internal representation, and we are able to change the representation in the future (potentially by rejecting calls to the API and letting the caller use the main functions).


The PyIntExportLayout proposed above is fine, but I'd also rather this be a single function. I'm totally unconvinced of the need for an entire API set here, and I have no issue with this being a complicated function. It will discourage the people who shouldn't be using it.

vstinner commented 1 week ago

It seems like different things should be decided / designed:


PyLong_GetDigits(PyObject v, void buffer, size_t count, size_t bits_per_digit)

I suggest to rename this function to PyLong_Export() and add a "release" / "free" function to release/free memory if the export function had to allocate memory.

For example, for small integer, a small array for the single digit can be allocated.

Another example, the export function can store a strong reference to the Python int object, to make sure that it remains valid while the export is being used. The release function would just DECREF its refcount.

For the "layout", maybe it should be constant, as Mark proposed, rather than set at each call.

Something like:

typedef struct PyLongLayout {
     uint8_t bits_per_digit;
     int8_t word_endian;
     int8_t array_endian;
     uint8_t digit_size;
} PyLongLayout;

const PyLongLayout PyLong_NATIVE_LAYOUT;

// Export to PyLong_NATIVE_LAYOUT layout
// Must call PyLong_ReleaseExport(v, buffer) once done
int PyLong_Export(PyObject *v, void **data, size_t *count);

// Release the export created by 
void PyLong_ReleaseExport(PyObject *v, void *data);

I think that we should think about the opposite "import" function directly to have a consistent API. It can be something like:

PyLongObject *PyLong_Import(PyLongLayout layout, const void *data, size_t count);
skirpichev commented 1 week ago

Yes. But not all integers are big.

But this is a safe assumption for non-compact values, right?

I think it's fine if PyLong_GetDigits() will just return NULL and set ValueError for "small" integers. We have several functions to support import/export in this case.

For compact ints larger than a digit (e.g. 32 bits set in a native int, with 30-bit digits), the proposed API can't work.

Was such extension of the _longobject struct discussed somewhere? IIUC, this will require special arithmetic rules for such small ints.

This still assumes that all ints share the same layout.

Again, probably it will be true for "big enough" ints.

Proposed API (in both variants) actually dedicated to this case.

Conisder something similar to what https://github.com/python/cpython/issues/119609#issuecomment-2136866285:

Seems much more complex approach, just to support small ints. Do we really need to pay this price?

For the "layout", maybe it should be constant, as Mark proposed, rather than set at each call.

But that means breaking ABI if we change PyLong_NATIVE_LAYOUT someday, isn't?

I think that we should think about the opposite "import" function directly to have a consistent API. It can be something like:

Well, referenced Mark's issue has proposal for mpz_import/export-like functions. That might be an alternative, but this requires much more efforts from the CPython side.

This proposal instead leave most work to numeric libraries. From the CPython we just offer access to the absolute value of "big enough" integers as an array of "digits" with given layout (all required to fill order, endian and nails parameters of mpz_import()).

vstinner commented 1 week ago

Seems much more complex approach, just to support small ints. Do we really need to pay this price?

The C API is implemented by other Python implementations which can use a different layout, have different constraints, and so need to do different things on Export + ReleaseExport.

But that means breaking ABI if we change PyLong_NATIVE_LAYOUT someday, isn't?

ABI change would mean that the ABI of the PyLongLayout structure would change. But its values are free to change between two Python versions, no?

zooba commented 1 week ago

I suggest to rename this function to PyLong_Export() and add a "release" / "free" function to release/free memory if the export function had to allocate memory.

I agree, but I also want to point out that I proposed an API scheme that covers all of these cases and would mean we can add them much more cheaply. If we're going to start adding many specialised APIs of this style, I'd like us to consider just adding a general API for it, so each new "export struct" can just be an extension of it rather than an entirely new API each time.

skirpichev commented 1 week ago

or possibly use the buffer interface, which I think can cover the same thing?

Hmm, I think so:) That certainly fits the scope of the current proposal. But can we provide a writable buffers for integers (which supposed to be immutable)? I would like to streamline also conversion from GMP-like libraries.

PyLongObject PyLong_Import(PyLongLayout layout, const void data, size_t count);

The layout argument here is redundant too, unless we want to do mpz_export()'s job. Both *_Export() and *_Import() functions should use whatever the current PyLong_NATIVE_LAYOUT is.

The C API is implemented by other Python implementations which can use a different layout, have different constraints, and so need to do different things on Export + ReleaseExport.

That's true, but difference in the layout is easy to handle (if we have API to query it like Py_GetIntLayout()).

Other differences might be related with the case of "small" integers, where maybe someone (or CPython someday) will use a different structure for longobject (i.e. not a heterogeneous array of "digits"). I'm not sure if it worth to support that case with new API. Why not fail in this case as fast as possible, then users can switch to API for small integers? Consider:

static void
mpz_set_PyLong(mpz_t z, PyObject *obj)
{
    Py_ssize_t len;
    const digit *digits = PyLong_GetDigits(obj, &len);

    if (!digits) {
        /* Assuming obj is an integer, above call *might* fail for small
           integers.  We catch this and use special functions. */
        mpz_set_si(z, PyLong_AsLong(obj));
    }
    else {
        /* Else, we have "array of digits" view with some layout, which
           may vary with implementation. */
        PyIntExportLayout layout;

        Py_GetIntLayout(&layout);
        mpz_import(z, len, layout.array_endian, layout.digit_size, layout.word_endian,
                   layout.digit_size*8 - layout.bits_per_digit, digits);

        int sign = 1;
        PyLong_GetSign(obj, &sign);
        if (sign < 0) {
            mpz_neg(z, z);
        }
    }
}
encukou commented 4 days ago

Speaking as one indiviudual member of the WG:

can we provide a writable buffers for integers (which supposed to be immutable)?

No, not directly. We could provide a “zero-copy builder” -- something that's not a PyLong (and has no chance of escaping to arbitrary Python code), but has enough space reserved for the header so that it can be converted to a PyLong in-place, after the buffer is filled.

Perhaps something along the lines of:

PyLongBuilder_New(PyLongLayout, size_t, void **);

void *data_to_fill;
PyLongBuilder *b = PyLongBuilder_New(layout, count, &data_to_fill);
memcpy(data_to_fill, your_data, count);
PyObject *obj = PyLongBuilder_Finish(b);
// If "layout" happens to match CPython's internal layout for this kind of int,
// `PyLongBuilder_Finish` fills in a PyObject/PyLong header and casts to PyObject*.
// Otherwise, it allocates a new object and copies the data to it.

Both *_Export() and *_Import() functions should use whatever the current PyLong_NATIVE_LAYOUT is.

This assumes PyLong_NATIVE_LAYOUT is a compile-time constant, used by all ints. Rather than bake the one current exception (small ints) into the API, IMO we need an API that acknowledges that there are several possible layouts.

Perhaps something like:

Py_ssize_t len;
PyIntExportLayout layout;
void *data = PyLong_Export(obj, &len, &layout);

For importing, maybe a function to get the layout to pass to the above PyLongBuilder_New, if that is a good idea:

int PyLong_GetLayout(Py_ssize_t bit_length, int sign, PyIntExportLayout *result);

If we're going to start adding many specialised APIs of this style, I'd like us to consider just adding a general API for it, so each new "export struct" can just be an extension of it rather than an entirely new API each time.

Each type of “exportable” object needs a different set of “layout” arguments. Strings need an encoding, ints need the 4 variables mentioned in this thread, sequences/iterables will perhaps need a chunk size... It seems to me that separate functions are a good abstraction for callers, even if we add a system that allows arbitrary classes to implement the various kinds of export/import.


Meta-discussion: we're brainstorming rather than making decisions; I think Discourse would be a better place for this kind of conversation.