capi-workgroup / decisions

Discussion and voting on specific issues
4 stars 1 forks source link

Add PyLong_FromInt64() and PyLong_ToInt64() #32

Open vstinner opened 4 days ago

vstinner commented 4 days ago

Add 8 new functions to the limited C API to convert C numbers from/to Python int:

// Return a Python int on success.
// Set an exception and return NULL on error.
PyAPI_FUNC(PyObject*) PyLong_FromInt64(int64_t value);
PyAPI_FUNC(PyObject*) PyLong_FromUInt64(uint64_t value);

// Set *value and return 0 on success.
// Set an exception and return -1 on error.
PyAPI_FUNC(int) PyLong_ToInt32(PyObject *obj, int32_t *value);
PyAPI_FUNC(int) PyLong_ToUInt32(PyObject *obj, uint32_t *value);
PyAPI_FUNC(int) PyLong_ToInt64(PyObject *obj, int64_t *value);
PyAPI_FUNC(int) PyLong_ToUInt64(PyObject *obj, uint64_t *value);

UPDATE: I removed PyLong_FromInt32() and PyLong_FromUInt32() functions.

Compared to PyLong_AsLong(), To function return value is status (success or error, 0 or -1) and the result is set in the value argument (on success). It avoids having to call PyErr_Occurred() to distinguish -1 (success) and -1 (error). See Problem #1 for details.

Links:

zooba commented 4 days ago

There's nothing particularly controversial about this API, right? Other than the question of whether it should be "real" or macros?

I'm quite happy with it being a real API.

serhiy-storchaka commented 4 days ago

Since PyLong_To* functions have signature compatible with the converter signature used with the O& format unit in PyArg_Parse(), I think that it would be better to make the return value compatible with the converter convention: return 1 for a successful conversion and 0 if the conversion has failed. So these functions could be used with PyArg_Parse() (and other functions in this family).

Otherwise we can end with two sets of converters with opposite return conventions. It will be very errorprone.

vstinner commented 4 days ago

I think that it would be better to make the return value compatible with the converter convention: return 1 for a successful conversion and 0 if the conversion has failed.

I don't think that this use case is common enough to not follow the C API Guidelines: https://devguide.python.org/developer-workflow/c-api/index.html#guidelines-for-expanding-changing-the-public-api

For converters, I would prefer to add support for int32_t and uint64_t to Argument Clinic and reuse these functions.

zooba commented 4 days ago

think that it would be better to make the return value compatible with the converter convention

It's tempting, for sure, but I think we're better to keep converters as the exception. We really want developers to have a single mental model for how to handle return values, and that unfortunately means converters need to be clearly labelled, and not just convenient functions.

Adding some form of sized version of i and u to PyArg_ParseTuple wouldn't be a bad step. That's our public API - Argument Clinic is still internal only, so we don't fix that until we need it, and it doesn't help public users at all when we do.

encukou commented 4 days ago

Other than the question of whether it should be "real" or macros?

Thank you for humoring me on the macros thing; I think it's clear we want real functions :)

keep converters as the exception

Yes. We settled on using -1 (or NULL) for errors. Unfortunately some existing functions -- or even families of functions -- don't do that; we have to live with that.

Eventually, if we get to the point where O& sticks out too much, we might want to add a new PyArg_Parse format code. I don't think we're there yet.

vstinner commented 4 days ago

@serhiy-storchaka has some concerns about calling obj.__index__() which can release the GIL indirectly:

Calling __index__ can release the GIL and make values saved in C variables (borrowed references, the size of a list, etc) invalid. Existing code can rely on atomicity of PyLong_AsUnsignedLong(), so calling __index__ in it is an unsafe change.

IMO it's good that signed and unsigned functions have the same behavior, that both call __index__(). Existing functions are inconsistent: signed functions (ex: PyLong_AsInt()) call __index__(), whereas PyLong_AsUnsignedLong() don't call the method.

The GIL issue can be documented.

vstinner commented 1 day ago

@serhiy-storchaka @mdboom: Do you have any additional comment about these APIs, or can I just open a vote?

mdboom commented 1 day ago

I have nothing to add that hasn't already been said.

serhiy-storchaka commented 1 day ago

My comment was about the existing C API like PyLong_AsUnsignedLong() and why it is not safe to change it.

We need a C API which respects special methods like __index__, and it should be default. On other hand, we need a C API to access the content of the object. This is a contradiction. Creating two sets of functions will cause confusion, especially if there will be more than two sets, with different signatures and different calling conventions. I have an idea how to mitigate this problem, but it is for a different discussion. It does not stay on the way of this set of functions, because they all are just wrappers around PyLong_FromNativeBytes/PyLong_AsNativeBytes.

PyLong_FromInt32 and PyLong_FromUInt32 are redundant, because the integer argument can always be promoted to 64 bits for PyLong_FromInt64/PyLong_FromUInt64. We have both PyLong_FromLong and PyLong_FromLongLong because the latter was optional, not all compilers supported long long.

On other hand, the lack of PyLong_ToInt16 and PyLong_ToInt8 requires adding additional check after calling PyLong_ToInt32, and producing OverflowError with inconsistent error messages (depending on the input value) or writing more verbose code with PyLong_AsNativeBytes.

zooba commented 1 day ago

Right, this is essentially the difference between the abstract and concrete APIs, a difference which has been eroding over the last few years.

If we think we can recover it, then we could argue the caller should PyLong_Check before using these functions, and use a more abstract function if they don't actually have an int already.

Alternatively, if we don't expect to shut the door to concrete APIs operating abstractly, then we could document that if PyLong_Check is true, these functions will not call back into Python code in order to convert to an integer.

I'm +0 on the latter, just out of pragmatism. But it saddens me to lose the clear abstract/concrete separation. (It will make those who want our API to be strongly typed sadder than me, though. I like it being largely abstract, I just think we ought to know which APIs are which.)

vstinner commented 1 day ago

@serhiy-storchaka: It's uneasy for me to understand what do you propose.

Creating two sets of functions will cause confusion, especially if there will be more than two sets, with different signatures and different calling conventions.

So are you ok with PyLong_ToInt64() calling __index__() or not? What do you suggest?

PyLong_FromInt32 and PyLong_FromUInt32 are redundant

Do you suggest to not add them and use PyLong_FromInt64() and PyLong_FromUInt64()? That sounds reasonable.

On other hand, the lack of PyLong_ToInt16 and PyLong_ToInt8 requires adding additional check after calling PyLong_ToInt32 (...)

I didn't want to add too many functions at the beginning, so I started with the two most common sizes: 32 and 64 bits. We can add more functions later, once we agree on the design.

vstinner commented 1 day ago

If we think we can recover it, then we could argue the caller should PyLong_Check before using these functions, and use a more abstract function if they don't actually have an int already.

I agree that PyLong_Check(), or even PyLong_CheckExact(), is a good way to check if an argument is an integer.

In general, users don't care much if it's exactly an integer or not (if the object type is Python int). For example, I'm used to pass a boolean when an integer is expected (bool inherits from int), even if a boolean is a different concept than an integer (to me at least).

zooba commented 1 day ago

In general, users don't care much if it's exactly an integer or not (if the object type is Python int).

Right, but the case here is when the user cares a lot that the GIL is not released during their PyLong_ToInt## call. They're more concerned about that than the exact type of the value.

I suspect we could probably get away with PyArg_ParseTuple and friends doing the __index__ conversion and the specific converter functions just raising a TypeError.[^1] Surely the vast majority of not-ints appearing in an int-like context will be in function arguments or in our code (e.g. any GetItem implementations).

The three main questions for me are:

[^1]: I'd also be okay with removing the extra work from AsNativeBytes, since we haven't properly released that yet.

erlend-aasland commented 1 day ago

Why To and not As?

serhiy-storchaka commented 1 day ago

Alternatively, if we don't expect to shut the door to concrete APIs operating abstractly, then we could document that if PyLong_Check is true, these functions will not call back into Python code in order to convert to an integer.

Yes, this was my idea. It was discussed earlier, but I lose previous rounds (perhaps I was not persistent enough). It affects more than the C API, so it should be discussed separately. Where is the best place for this? We need not only the C API Workgroup.

So are you ok with PyLong_ToInt64() calling __index__() or not? What do you suggest?

I am OK with calling __index__() in a new C API. Details is a matter of a different discussion.

Do you suggest to not add them and use PyLong_FromInt64() and PyLong_FromUInt64()? That sounds reasonable.

Do you have a use case for PyLong_FromInt32()? Wait, do you have a use case for PyLong_FromInt64()? long long has at least 64 bits, so PyLong_FromLongLong() can be used for converting from int64_t. It seems that we do not need new functions for this direction of conversion, except maybe PyLong_FromMaxInt().

I didn't want to add too many functions at the beginning, so I started with the two most common sizes: 32 and 64 bits. We can add more functions later, once we agree on the design.

Just keep in mind that there is a need for such features.

cavokz commented 23 hours ago

@serhiy-storchaka has some concerns about calling obj.__index__() which can release the GIL indirectly:

Calling __index__ can release the GIL and make values saved in C variables (borrowed references, the size of a list, etc) invalid. Existing code can rely on atomicity of PyLong_AsUnsignedLong(), so calling __index__ in it is an unsafe change.

IMO it's good that signed and unsigned functions have the same behavior, that both call __index__(). Existing functions are inconsistent: signed functions (ex: PyLong_AsInt()) call __index__(), whereas PyLong_AsUnsignedLong() don't call the method.

The GIL issue can be documented.

Is it possible to have it documented anyway? I've no clue of why calling __index__() should/could release the GIL and what kind of atomicity is lost. From a user prospective, I've never doubted that PyLong_AsUnsignedLong() and any of its friends may release the GIL although, hopefully, just temporarily.

zooba commented 23 hours ago

Where is the best place for this? We need not only the C API Workgroup.

Either the api-evolution repository would be good, if we want to develop a "rule", though you'd have to announce it on a relevant issue to get people's attention. Or I'm sure we can find a bug related to implicit __index__ calls releasing the GIL and causing a crash (we had one recently where PySys_Audit would do it, and there's one about bytearray somewhere, too). Probably since there's multiple bugs, we'd want to discuss a proposal in api-evolution on how to approach it (i.e. 1. don't do it, 2. document it and how to avoid it, 3. it's okay and users should use magic to be thread safe 🙃 )

Do you have a use case for PyLong_FromInt32()? Wait, do you have a use case for PyLong_FromInt64()?

Convenience, discovery, and pairing with the other API is a totally fine reason. (It may not be justified every time, but it's an okay reason.) We're looking at API development on a long scale (10+ years), and the goal is to have the right people (those who will benefit from it) using the right API in the right way. Making it easy to discover which APIs you should be using (and making it hard to discover the ones you shouldn't be using) is a big part of this, and the final decision/guess on how to do that is the WG's responsibility.

Is it possible to have it documented anyway?

Yes, documentation is cheap, at least for the APIs that already do this and won't be changing. It's worth discussing for any new APIs though.

vstinner commented 23 hours ago

@erlend-aasland:

Why To and not As?

As functions such as PyLong_AsLong() return the value directly, but can also return -1 on error and so PyErr_Occurred() should be checked.

long value = PyLong_AsLong(obj);
if (value == -1 && PyErr_Occurred()) { /* error */ }

vs

int64_t value;
if (PyLong_ToInt64(obj, &value) < 0) { /* error */ }
vstinner commented 23 hours ago

@serhiy-storchaka:

Wait, do you have a use case for PyLong_FromInt64()?

My initial motivation to propose these new <stdint.h> functions was https://github.com/capi-workgroup/api-evolution/issues/10 discussion: replace int/unsigned long with int32_t/uint64_t to have a better ABI. I would like <stdint.h> types to be as convenient to use with the Python C API as the classic int/long types.

erlend-aasland commented 23 hours ago

As functions such as PyLong_AsLong() return the value directly, but can also return -1 on error and so PyErr_Occurred() should be checked.

Not all As APIs are designed like this; it's not a defining trait for As APIs.

The current To APIs often suggests a transformation (ToUppercase, ToLowercase, etc.); its a different kind of APIs.

serhiy-storchaka commented 23 hours ago

Currently it looks like a legend about Newton and a cat. He made a hole in the door to allow the cat to enter or leave without taking him away from his studies. When she brought in 5 kittens, he made 5 smaller holes for the kittens.

All cases with int64_t, int32_t, int16_t, int8_t, long, short, and signed char can be handled by PyLong_FromLongLong(). No need to add PyLong_From64Int(), PyLong_From32Int(), PyLong_From16Int(), PyLong_From8Int(), PyLong_FromInt(), PyLong_FromShort(), and PyLong_FromSignedChar(). If you want to cover absolute all cases, add a function that takes intmax_t argument, it will add a new value.

zooba commented 23 hours ago

I'm not sure we'll ever again see another programming language that won't allow implicitly upcasting integers (like Ada), so yeah, there's probably not a lot gained. If intmax_t is also in stdtypes, then can we have FromInt()?

it's not a defining trait for As APIs.

We should probably figure out what is a defining trait. I have this vague idea that To APIs convert PyObject->PyObject and As APIs convert PyObject->native, but I can't back that up fully.

encukou commented 18 hours ago

If intmax_t is also in stdtypes, then can we have FromInt()?

The size of intmax_t depends on the compiler (which perhaps delegates the decision to one of various platform ABI specs). AFAIK, the reason we're adding FromInt64 even though we already have FromLongLong is making it possible to avoid types like that.

zooba commented 18 hours ago

The size of intmax_t depends on the compiler

True, and that matters most for cross-language callers, who probably need to rely on the exact size.

So FromInt64 is justified. You could argue that FromInt32 is justified on 32-bit platforms, where conversion to 64-bit might be more expensive (though I'd be incredibly surprised if it was prohibitive), and perhaps that's all we need to justify both?