capi-workgroup / api-evolution

Issues related to evolution of the C API
15 stars 1 forks source link

Bits counting #52

Closed serhiy-storchaka closed 1 month ago

serhiy-storchaka commented 2 months ago

Previous attempt to add a C API for int.bit_length() failed because in most cases the user can use PyLong_AsNativeBytes() (see https://github.com/capi-workgroup/decisions/issues/28). I still think that there are use cases for exact bits counting, so we may add such API in future. There are also other private C API functions that work with bits counts and may be exposed to public in future.

The main problem to me with this API was that it returns the bits count as size_t or Py_ssize_t. On 32-bit platform it is not difficult to create an integer or even several integers whose bits count does not fit in size_t or Py_ssize_t (This is only 0.5 or 0.25 GB), so this is a practical issue. I recently solved this by making these functions always using 64-bit integers, even on 32-bit platforms. And we can solve it on 64-bit platforms by forbidding creation of Python integers whose bits count does not fit in 64 bits (see https://discuss.python.org/t/imposing-a-hard-limit-on-python-integers/62302/10). There is no much practical difference between $2^{63}$, $2^{64}$ and $2^{66}$ bits -- all limits are insanely large and not achievable on practice. This makes _PyLong_NumBits() and _PyLong_Frexp() always successful.

There is still one implementation question: should we use signed or unsigned 64-bit integer type for bits count? It was a very practical question for 32-bit counts, but for 64-bit counts both $2^{63}$ and $2^{64}$ are very large, so no need to squeeze a bit. On other hand, the bits count is always non-negative, there is no use case for negative values.

This will affect private C API functions _PyLong_NumBits(), _PyLong_Frexp(), _PyLong_Rshift() and _PyLong_Lshift().

Implementation for unsigned count: https://github.com/python/cpython/pull/123498.

encukou commented 2 months ago

My vote goes to signed, so the sign bit can serve its usual purpose as an error indicator.

serhiy-storchaka commented 1 month ago

We can reserve (uint64_t)-1 for error indication. Actually, I already reserved it to avoid integer overflow in _PyLong_Frexp().

Implementation for signed count: https://github.com/python/cpython/pull/123724

It is slightly more complex than the unsigned variant, because most of existing code already use uint64_t. If we even made _PyLong_Rshift() public, we will need to replace assert(shiftby >= 0) with a runtime check.

We can also use INT64_MAX instead of UINT64_MAX as the limit with the uint64_t type. Then the code can use uint64_t or int64_t without integer overflow.

vstinner commented 1 month ago

Using signed int64_t would be consistent with signed Py_ssize_t used basically everywhere in the Python C API.

I don't think that INT64_MAX bits is too small: such int number should be almost impossible to create or use in practice on nowadays computers.

serhiy-storchaka commented 1 month ago

Merged the signed version.