capi-workgroup / decisions

Discussion and voting on specific issues
4 stars 1 forks source link

Add PyLong_GetNumBits() function #28

Closed vstinner closed 1 week ago

vstinner commented 3 weeks ago

In Python 3.13 alpha 1, I removed the private _PyLong_NumBits() function, but it's used by 3 big/popular projects: pywin32, MariaDB and panda3d.

I propose adding a public function to replace this private API:

Py_ssize_t _PyLong_GetNumBits(PyObject *obj);

The C function is similar to the Python int.bit_length() method. On overflow, it's recommended to call the int.bit_length() method which is not limited to Py_ssize_t.

In the meanwhile, the private _PyLong_NumBits() function was restored in the 3.13 and main branches.

See also the proposed PyLong_GetSign() function.

gvanrossum commented 3 weeks ago

On overflow, it's recommended to call the int.bit_length() method which is not limited to Py_ssize_t.

Actually, if the number has more than ssize_t one bits, it occupies at least 1/8th of the total available memory space (18 exabytes), so maybe it's time to wave the white flag at that point. :-)

vstinner commented 3 weeks ago

Actually, if the number has more than ssize_t one bits, it occupies at least 1/8th of the total available memory space (18 exabytes), so maybe it's time to wave the white flag at that point. :-)

Right for 64-bit platforms (most common platforms). But on 32-bit platforms, the OverflowError "only" requires a single number of 273 MiB which is quick to create. Well, IMO Python is not designed to manage such "big integers", so we should not bother much about it.

My first proposition was int _PyLong_GetNumBits(PyObject *obj, size_t *numbits) API (unsigned size), but @encukou and @serhiy-storchaka prefer this API (if I understood correctly): https://github.com/python/cpython/issues/119714#issuecomment-2136885490

serhiy-storchaka commented 3 weeks ago

I propose to return int64_t or uint64_t to avoid the problem of integer overflow on 32-bit platfoms. This will allow to use this C API in int.bitlength() (currently it reimplements the algorithm to support all integer objects).

davidhewitt commented 3 weeks ago

In updating to Python 3.13 in PyO3 I replaced _PyLong_NumBits with PyLong_AsNativeBytes. It returns the number of bytes required to represent the integer and so might satisfy the related use cases, for example it met our needs in PyO3. Just an observation as a possible reason to not need this at all.

vstinner commented 3 weeks ago

pywin32 uses _PyLong_Sign() and _PyLong_NumBits() to decide how to encode a Python int to Windows COM ABI: https://github.com/mhammond/pywin32/blob/ad5779b23b42653c9fa5dfbb18dd2a8fe5691d0d/com/win32com/src/oleargs.cpp#L139-L209. IMO _PyLong_NumBits() remains relevant even with PyLong_AsNativeBytes() addition.

panda3d uses _PyLong_NumBits() with _PyLong_AsByteArray(). The number of bits is used to allocate or resize a buffer to call _PyLong_AsByteArray():

encukou commented 3 weeks ago

Seems to me like pywin32 could use PyLong_AsNativeBytes too.

vstinner commented 1 week ago

What is the C API Working Group call on this API? Should we add it, or should we guide 3rd party C extensions towards PyLong_AsNativeBytes()?

See also https://github.com/capi-workgroup/decisions/issues/31: "Add public function PyLong_GetDigits()".

encukou commented 1 week ago

I'd say guide them toward PyLong_AsNativeBytes. If anyone sees a use case where a possibly over-estimated number of bytes rather than bits isn't enough, let's reopen?

vstinner commented 1 week ago

Ok, let's do that. I close the issue.