capi-workgroup / api-evolution

Issues related to evolution of the C API
14 stars 1 forks source link

Scope of the limited API #41

Open encukou opened 10 months ago

encukou commented 10 months ago

Victor requested to add the new PyList_Extend and PyList_Clear to limited API: https://github.com/python/cpython/pull/111862

This API can be trivially replaced by PyObject_CallMethod/PyObject_VectorcallMethod -- or in this case, by PyList_SetSlice.

I would prefer to not add API that has such a trivial Python equivalent, unless there's a clear need for it (e.g. for performance) in third-party projects.

vstinner commented 10 months ago

Oh, I just saw this issue 10 min after creating https://github.com/capi-workgroup/api-evolution/issues/42 which is similar.

This API can be trivially replaced by PyObject_CallMethod/PyObject_VectorcallMethod -- or in this case, by PyList_SetSlice.

As an user, I would like to write a C extension using the limited C API, I expect a convenient API, and be able to distribute a single wheel binary (per platform-architecture combination). "Convenient" is obviously hard to define here.

For example, for me, PyList_Extend(list, arg) is more convenient than having to call PyList_SetSlice(list, PY_SSIZE_T_MAX, PY_SSIZE_T_MAX, arg) where PY_SSIZE_T_MAX looks like magic constants to me. In Python, I prefer to call list.extend(arg) than having to write list[len(list):] = arg.

If your goal is to write the bare minimum API such as Native Interface API, obviously, all "convenient" API should go. For me, the Native Interface API is more for machines, and the limited C API is more for human who write code manually.


Another criteria is performance: PyObject_CallMethod(list, "extend", "O", arg) is slower than PyList_Extend(), since the bytes string "extend" has to be decoded from UTF-8 to create a temporary Python str object, the "O" format string must be parsed to create an array of C arguments, and at the end the temporary Python str object must be destroyed.

Example on micro-benchmark on _PyLong_AsByteArray() to discuss if this function deserves to become a public C API, rather than calling Python int.to_bytes() in C.

On considering to make _PyLong_GCD() a public function, Serhiy wrote:

If direct call of _PyLong_GCD() makes the code 7% faster than using math.gcd(), it is perhaps not worth. Note that many methods of builtin types (like str.upper()) are not exposed in the C API. General Python object or method call API is the way to use them.


IMO we should also consider to take in account how common an API is used. Very commonly used APIs deserve a public API, whereas having to go through PyObject_Call...() is acceptable for rarely used APIs. That's related to providing a "convenient" API.

By the way, when the PyFrameObject members were removed from Python 3.12 C API, I documented how to update code using PyObject_GetAttrString(frame, "<member name>"). But apparently, performance also matters, so a dedicated getter function was added for each removed member to avoid having to call slow PyObject_GetAttrString() function.


My goal for the long term would be to treat the "C API" basically as the limited C API: that the limited C API becomes the default.

That's why I'm trying to clarifying what's "private" or "internal" in the "public C API".

I don't know if it's possible, and I expect that the "C API" will always be larger than the "limited C API", since some APIs are never going to enter the limited C API by design, such as the PyTypeObject structure members.

For me, there are two clear usages of "the C API":

This separation becomes more visible in Cython which started to support generating C code targeting the limited C API. Cython users can now decide their profile: stability/portability or performance.

My concern is that Python has many API layers:

IMO it's very confusing for everybody :-( It would be better to only have two main layers:

encukou commented 10 months ago

In the mean time, people can use pythoncapi-compat to get the macro to define PyList_Extend in terms of PyList_SetSlice. Or define the macro themselves. On any version, limited or unlimited. There is no need to rush.

vstinner commented 10 months ago

In the mean time, people can use pythoncapi-compat to get the macro to define PyList_Extend in terms of PyList_SetSlice. Or define the macro themselves. On any version, limited or unlimited.

Maybe a guideline can be defined from that? Is it possible and "not too complicated" (how can it be measured? number of lines?) to reimplement the needed feature using existing limited C API?

In the case of PyList_Extend() and PyList_Clear(), there is a way:

#define PyList_Extend(list, arg) PyList_SetSlice((list), PY_SSIZE_T_MAX, PY_SSIZE_T_MAX, (arg))
#define PyList_Clear(list) PyList_SetSlice((list), 0, PY_SSIZE_T_MAX, NULL)

pythoncapi-compat doesn't target the limited C API. Many functions are implemented in pythoncapi-compat with functions which are excluded from the limited C API, especially private functions.