Open encukou opened 4 months ago
Given that this hasn't been touched for a little while, I thought it might be useful to share some of my ideas to get some discussion stirring up!
AFAICS, the concern here is that a). the call API isn't consistent and b). we don't want to deal with addressing each individual function over time. I'll try to address both those problems here.
I think it should be, more or less, clear from the function name on what you should pass to it (I mentioned this here, for reference.) As of now, many of the call functions aren't clear, or worse, misleading! For example, at a glance, seeing PyObject_CallFunction
doesn't tell me that I'm calling something with a format string, it looks like I'm calling a function
object! I could see some users that are new to the C API making this mistake.
static PyObject *
my_func(PyObject *self, PyObject *python_function)
{
PyObject *res = PyObject_CallFunction(python_function) // ???
// ...
}
So, we should be more clear in the name. I originally suggested putting what should be passed in the name itself. We're already sort-of doing this, just not consistently. For example, PyObject_Call
could become PyObject_CallTuple
. Though, that seems misleading (it sounds like we're calling a tuple itself), and is also more pollution in the PyObject_
namespace. As a solution, we could group calling functions into their own prefix. Something like PyCall_
should suffice.
Now, the equivalent of PyObject_Call
would be PyCall_Tuple
(or maybe PyCall_WithTuple
, if the name still seems misleading.) Likewise, things PyObject_Vectorcall
could become PyCall_Vector
. An example to help visualize:
static PyObject *
my_func(PyObject *self, PyObject *whatever)
{
// No dict parameter -- more on that later
PyObject *res = PyCall_Vector(whatever, (PyObject *[]) { Py_None }, 1);
}
Though, PyObject_Call
and PyObject_Vectorcall
aren't really the problem here, so let's address those terribly named functions.
PyObject_*ObjArgs
doesn't seem clear to me that it's a variadic argument, what about something like PyCall_*Variadic
?PyObject_CallFunction
, as mentioned, seems like we're calling a Python function
object. Instead, let's make it more clear that it's a format string. Perhaps PyCall_Format
?PyObject_*Method
functions are somewhat OK, but it should be clear about the whole name lookup shenanigans. How about PyCall_FromName
?Now, isn't that prettier!
static PyObject *
my_func(PyObject *self, PyObject *whatever)
{
PyObject *foo = PyCall_Format(whatever, "i", 42);
if (foo == NULL)
{
return NULL;
}
PyObject *bar = PyCall_MethodOneArg(foo, );
}
To address the consistency issue, we should have an equivalent API for each subcategory of the calling API. For example, a format string variation of PyObject_CallMethod
, as well as one for arbitrary objects.
As of now, I think we need the following subcategories:
PyCall_*FromName
(looking up via a name i.e. methods)PyCall_*Dict
(anything that takes a kwargs dictionary)PyCall_*Tuple
(anything that takes arguments as a tuple)PyCall_*Format
(takes parameters via a format string)PyCall_*Variadic
(takes parameters through vargs)PyCall_*VaList
(equivalent to Variadic
but through a va_list
object.)We would need some of these for each other as well, e.g. PyCall_TupleDict
or PyCall_VectorFromName
(but not all of them e.g. PyCall_VariadicVaList
, because that doesn't make any sense!)
Unfortunately, this does create a lot of functions, but it does get everything people could need in one sweep, without having to take feature requests in the future. Maybe this would have been better for api-revolution
:wink:
I like PyCall_
for if we need a fresh start.
I hope that if we get a low-overhead way of borrowing an array of PyObject*
(and size) from a tuple, we can replace *_Tuple
with *_Vectorcall
-- perhaps make *_Tuple
into convenience macros.
Similarly, for *_Format
, *_Variadic
& *_VaList
: perhaps we only need functions that can convert those to an array.
Maybe we should add a more powerful GetAttr variant that can retreive an unbound method (and a flag that it did so), which could then be passed more directly to the actual call function? (FWIW, similar things a “more powerful GetAttr” could optionally do are “special/type lookup” (skipping instance attributes); retrieving the class that held the found attribute; and returning descriptors rather than calling them.)
I hope that if we get a low-overhead way of borrowing an array of
PyObject*
(and size) from a tuple
I like that approach. Disclaimer: I'm not all that familiar with the tuple implementation -- I'm guessing that ob_items
and ob_size
aren't sufficient?
For converting variadic arguments into an array, we could possibly do this with some macro shenanigans. I think something like this could work?
#define _Py_NUMARGS(...) (sizeof((PyObject*[]) { __VA_ARGS__ }) / sizeof(PyObject*))
#define PyCall_Variadic(obj, ...) PyObject_Vectorcall(obj, (PyObject*[]) { __VA_ARGS__ }, _Py_NUMARGS(__VA_ARGS__), NULL)
ob_items
andob_size
aren't sufficient?
Not for the limited API.
AFAIK, PyPy has an optimization where a tuple of small ints is stored as a C array of small ints, rather than pull object pointers.
The limited API should allow optimizations like this in the future, so it needs explicit PyObject**
+size export API.
For converting variadic arguments into an array, we could possibly do this with some macro shenanigans.
Yup. IMO, it's fine to design variadic functions for C/C++ only, if we also have array+size variants of the functions.
We could add a stable const PyObject **PyTuple_AsArray(Py_ssize_t *sizeout)
for that, then (notice the explicit const
-- if we changed the tuple implementation in the future, we can still reconstruct a PyObject **
from it, but we won't have to worry about someone using the array to reflect changes in the actual object.)
Though, this does raise a second question: maybe using an array and size was a mistake. Should we add a variation of vectorcall that uses a NULL
terminated array instead?
notice the explicit
const
-- if we changed the tuple implementation in the future, we can still reconstruct aPyObject **
from it
Who would hold the references to the objects?
AFAICS, PyTuple_AsArray
needs to incref each element, and there should be a Py_DecrefArray(PyObject **, Py_ssize_t)
to pair with it.
See also https://github.com/capi-workgroup/problems/issues/15
Alternatively, there could be a PyTuple_AsBorrowedArray
, which means that all tuple implementations need to be able to not just create a PyObject*
array on demand, but also to store it as long as the tuple lives.
Should we add a variation of vectorcall that uses a
NULL
terminated array instead?
For my own personal opinion: No. Array+size forever. Zero-terminated strings are a trap C fell into in 1970s to save a byte. They should only be used when necessary, or for hand-written literals (since C makes defining an array+size frustratingly un-ergonomic), and always as an alternative to array+size.
Rant aside,
So accepting NULL-terminated array would mean extra CPU time in most cases. When that's no the case, the caller can do the counting on their side.
Who would hold the references to the objects?
I was under the impression that it would contain borrowed references, but I realize now that could cause a bit of a thread safety issue... I see where the design problem comes in, now!
Incidentally, we could also take #123372 into account when designing this. New functions should probably only use vectorcall if it would speed things up, so likewise, we should have an API for quickly constructing a tuple from an array.
After some pondering, I'm guessing that we still want two main variations for calling: vectorcall, and a tuple. Designing new APIs to borrow a tuple's items so we can macro them into vectorcall just doesn't seem worth it :( -- if the user has a tuple, they'll likely just use PyObject_Call
anyway.
The object calling API has grown organically over the years, and looks like an incomplete Cartesian product of features, with not-always-consistent naming.
For reference, here are the kinds of arguments the
Call
functions take:Callable:
char*
arg[0]
attribute; name given as Python stringPositional arguments:
...
)va_list
Keyword arguments:
We have some requests to change things, like:
Rather than evaluate each of those individually, perhaps we should look at the big picture. How should this look, ideally? How do we get there?