Closed erlend-aasland closed 2 months ago
In this case, isn't it a complicating factor that the underlying tp_
slot, tp_next
, has the same ambiguous API? And that's not so easy to change. So this would be mostly cosmetic, right?
In this case, isn't it a complicating factor that the underlying
tp_
slot,tp_next
, has the same ambiguous API? And that's not so easy to change. So this would be mostly cosmetic, right?
I'll quote @markshannon in https://github.com/capi-workgroup/problems/issues/1#issuecomment-1604074694:
- Why is it bad? It is global state, with all the problems that entails
- Is it slow? Yes. It forces us to save and restore it across calls, and to perform expensive checks after every call into C extension code.
- Does it prevent optimizations? Yes, for both of the above reasons.
The current API (PyIter_Next
) requires you to call PyErr_Occurred
("global state") in order to find out if an error happened.
The current API (
PyIter_Next
) requires you to callPyErr_Occurred
("global state") in order to find out if an error happened.
I get that. But changing PyIter_Next
to have a less ambiguous signature just moves the problem -- it still has to call tp_next
which itself has a similar ambiguity. So I don't think it addresses any of the three bullets from Mark that you quoted.
Well, it saves one call to PyErr_Occurred()
, so it should be a little bit faster. But it's still a call to PyErr_Occurred()
, which still has all the disadvantages Mark mentions.
I get that. But changing
PyIter_Next
to have a less ambiguous signature just moves the problem -- it still has to calltp_next
which itself has a similar ambiguity. So I don't think it addresses any of the three bullets from Mark that you quoted.
Ah, yes that is true. I guess it boils down to a cosmetic issue, for now. I'll let the WG decide if a cleaner API is worth it.
My personal view is that we have bigger fish to fry.
+1 for the third one, it matches where we've been going so far and I don't see a reason to change course. But there are some details that make this fish not worth frying right now, like:
*next
be set to NULL or left as it was?next
itself be NULL (to advance the iterator without getting the value)?
- on -1/0 return, should
*next
be set to NULL or left as it was?
FWIW: yes, that would make for a saner public API.
- can
next
itself be NULL (to advance the iterator without getting the value)?
What's the use-case? I could not find such a use-case in the code base.
What's the use-case?
Consistency in whether NULL can be passed to an output parameter.
(Deciding whether we want that -- and how to treat output parameters in general -- is one of the proverbial bigger fish to fry.)
I feel it's better not to allow setting an output parameter's address to NULL. It's a rare case, and it slows down the API function (it has to check whether the pointer is NULL) and may bulk it up (with a DECREF, assuming the output value is produced anyways).
Also, I feel that upon error the output parameter should always be set to NULL, to avoid potentially uninitialized memory.
I like this API: int PyIter_NextItem(PyObject *iter, PyObject **item)
(I renamed next
to item
):
*item
to NULL, and return -1 on error.*item
to NULL, and return 0 on StopIteration
.*item
to a strong reference to the iterator item, and return 1 on success.iter
and item
cannot be NULL.
Also, I feel that upon error the output parameter should always be set to NULL, to avoid potentially uninitialized memory.
I agree 100% :-) We already did that in other recently added functions such as PyDict_GetItemRef()
.
IMO it's worth it to propose a better API for PyIter_Next() which has an ambiguous return value (require to check for PyErr_Occurred()
).
@erlend-aasland @gvanrossum @zooba: What do you think of the proposed API https://github.com/capi-workgroup/decisions/issues/21#issuecomment-2127484351 ? I understand that @encukou likes it.
Changing tp_next
slot API is not doable. I prefer to leave it as it is.
I like it. It makes this pattern work nicely (due to both errors and valid results coercing to true
):
PyObject *iter = PyObject_GetIter(o);
if (!iter) goto error;
PyObject *item = NULL;
while (PyIter_NextItem(iter, &item)) {
if (!item) goto error;
// work on item
Py_DECREF(item);
}
Py_DECREF(iter);
What do you think of the proposed API https://github.com/capi-workgroup/decisions/issues/21#issuecomment-2127484351 ?
I like it.
Changing
tp_next
slot API is not doable. I prefer to leave it as it is.
Yeah; it is unfortunate, but IMO it is not a blocker for providing a better PyIter_Next
API.
Ok, let's do a formal vote on the API https://github.com/capi-workgroup/decisions/issues/21#issuecomment-2127484351:
@serhiy-storchaka and @mdboom: You can now vote on this API. Just go ahead if you have any question.
@serhiy-storchaka @mdboom: Are you able to check checkboxes to vote?
@erlend-aasland: Congrats, your API was approved :-) You can go ahead and implement it! I close this issue.
Yay! It's not my API, though. I believe it was proposed by Irit or maybe Mark. I only insisted on the return value :) I'll dig up Irit's implementation and create a PR.
There is one issue we did not discuss: the current old API, PyIter_Next
, requires the caller to check if iter is an iterator. I think we want the API to check this, set a TypeError
and return -1
if iter is not an iterator.
Definitely.
Would it be possible to introduce a private function which bypasses the check as well (_PyIter_NextItem
) (just for CPython internal code just to reduce the number of isinstance() calls). I found those quite convenient when you care about performances in general but I'm not sure if it's the trend to do it in general.
It's not a full isinstance
call, but a NULL
check on tp->tp_iternext
-- a function pointer we're just about to call. I doubt it would have a measurable impact.
(PyIter_Check
also checks for _PyObject_NextNotImplemented
, but PyIter_Next
doesn't have to -- calling it will raise the TypeError.)
The super-optimized way to do this is to call tp->tp_iternext
directly.
It's not a full isinstance call, but a NULL check on tp->tp_iternext -- a function pointer we're just about to call. I doubt it would have a measurable impact.
Oh, then yes, it's probably not worth it (I thought it was something more intricate).
itertools
is an example of such super-optimized code. It intentionally uses tp_iternext
instead of PyIter_Next
to avoid the overhead of the later.
itertools
is an example of such super-optimized code. It intentionally usestp_iternext
instead ofPyIter_Next
to avoid the overhead of the later.
It's good that we don't see this in the rest of the standard library; if all extension modules were written like that, we'd have a maintenance nightmare. IMO, this is a good reason to introduce an internal _PyIter_NextItem
without the type checking and prune some verbose boiler-plate code in itertools
.
Previous discussion:
I would like to pursue this issue. IMO,
PyIter_Next
having an ambiguous return value is a good enough motivation for adding a replacement API. If you disagree, feel free to just close this issue. AFAICS, three APIs have been discussed; I'm in favour of Mark's proposal (item 3):1:
PyObject *PyIter_NextItem(PyObject* iter, int *err)
Return values: a NULL pointer (loop termination or error) or a PyObject pointer (item returned) Mentioned in https://github.com/python/cpython/issues/105201#issuecomment-1574044071
Example usage
```c PyObject *item; int err; while (item = PyIter_NextItem(iterator, &err)) { /* do something with item */ ... /* release reference when done */ Py_DECREF(item); } Py_DECREF(iterator); if (err < 0) { /* error */ } else { /* no error */ } ```2:
int PyIter_NextItem(PyObject* iter, PyObject **item)
Return values: -1 (error) or 0 (item returned or loop termination) Mentioned in https://github.com/python/cpython/issues/105201#issuecomment-1574044071
Example usage
```c PyObject *item; int err; while ((err = PyIter_NextItem(iterator, &item)) == 0 && *item != NULL) { /* do something with item */ ... /* release reference when done */ Py_DECREF(item); } Py_DECREF(iterator); if (err < 0) { /* error */ } else { /* no error */ } ```3:
int PyIter_NextItem(PyObject *iter, PyObject **next)
Return values: -1 (error), 0 (loop terminated), or 1 (item returned) Mentioned in https://github.com/python/cpython/issues/105201#issuecomment-1578342058
Example usage
```c PyObject *next; int rc; while ((rc = PyIter_NextItem(iterator, &next)) > 0) { use(next); Py_DECREF(next); } if (rc < 0) { /* cleanup */ return -1; } /* done */ ```