Introduce replacement API for `PyIter_Next` with a non-ambiguous return value

erlend-aasland commented 5 months ago

Previous discussion:

I would like to pursue this issue. IMO, PyIter_Next having an ambiguous return value is a good enough motivation for adding a replacement API. If you disagree, feel free to just close this issue. AFAICS, three APIs have been discussed; I'm in favour of Mark's proposal (item 3):

1: `PyObject PyIter_NextItem(PyObject iter, int *err)`

Return values: a NULL pointer (loop termination or error) or a PyObject pointer (item returned) Mentioned in https://github.com/python/cpython/issues/105201#issuecomment-1574044071

Example usage

```c PyObject *item; int err; while (item = PyIter_NextItem(iterator, &err)) { /* do something with item */ ... /* release reference when done */ Py_DECREF(item); } Py_DECREF(iterator); if (err < 0) { /* error */ } else { /* no error */ } ```

2: `int PyIter_NextItem(PyObject* iter, PyObject **item)`

Return values: -1 (error) or 0 (item returned or loop termination) Mentioned in https://github.com/python/cpython/issues/105201#issuecomment-1574044071

Example usage

```c PyObject *item; int err; while ((err = PyIter_NextItem(iterator, &item)) == 0 && *item != NULL) { /* do something with item */ ... /* release reference when done */ Py_DECREF(item); } Py_DECREF(iterator); if (err < 0) { /* error */ } else { /* no error */ } ```

3: `int PyIter_NextItem(PyObject *iter, PyObject **next)`

Return values: -1 (error), 0 (loop terminated), or 1 (item returned) Mentioned in https://github.com/python/cpython/issues/105201#issuecomment-1578342058

Example usage

```c PyObject *next; int rc; while ((rc = PyIter_NextItem(iterator, &next)) > 0) { use(next); Py_DECREF(next); } if (rc < 0) { /* cleanup */ return -1; } /* done */ ```

gvanrossum commented 5 months ago

In this case, isn't it a complicating factor that the underlying tp_ slot, tp_next, has the same ambiguous API? And that's not so easy to change. So this would be mostly cosmetic, right?

erlend-aasland commented 5 months ago

In this case, isn't it a complicating factor that the underlying tp_ slot, tp_next, has the same ambiguous API? And that's not so easy to change. So this would be mostly cosmetic, right?

I'll quote @markshannon in https://github.com/capi-workgroup/problems/issues/1#issuecomment-1604074694:

Why is it bad? It is global state, with all the problems that entails

Is it slow? Yes. It forces us to save and restore it across calls, and to perform expensive checks after every call into C extension code.

Does it prevent optimizations? Yes, for both of the above reasons.

erlend-aasland commented 5 months ago

The current API (PyIter_Next) requires you to call PyErr_Occurred ("global state") in order to find out if an error happened.

gvanrossum commented 5 months ago

The current API (PyIter_Next) requires you to call PyErr_Occurred ("global state") in order to find out if an error happened.

I get that. But changing PyIter_Next to have a less ambiguous signature just moves the problem -- it still has to call tp_next which itself has a similar ambiguity. So I don't think it addresses any of the three bullets from Mark that you quoted.

Well, it saves one call to PyErr_Occurred(), so it should be a little bit faster. But it's still a call to PyErr_Occurred(), which still has all the disadvantages Mark mentions.

erlend-aasland commented 5 months ago

I get that. But changing PyIter_Next to have a less ambiguous signature just moves the problem -- it still has to call tp_next which itself has a similar ambiguity. So I don't think it addresses any of the three bullets from Mark that you quoted.

Ah, yes that is true. I guess it boils down to a cosmetic issue, for now. I'll let the WG decide if a cleaner API is worth it.

gvanrossum commented 5 months ago

My personal view is that we have bigger fish to fry.

encukou commented 5 months ago

+1 for the third one, it matches where we've been going so far and I don't see a reason to change course. But there are some details that make this fish not worth frying right now, like:

on -1/0 return, should *next be set to NULL or left as it was?
can next itself be NULL (to advance the iterator without getting the value)?

erlend-aasland commented 5 months ago

on -1/0 return, should *next be set to NULL or left as it was?

FWIW: yes, that would make for a saner public API.

can next itself be NULL (to advance the iterator without getting the value)?

What's the use-case? I could not find such a use-case in the code base.

encukou commented 5 months ago

What's the use-case?

Consistency in whether NULL can be passed to an output parameter.

(Deciding whether we want that -- and how to treat output parameters in general -- is one of the proverbial bigger fish to fry.)

gvanrossum commented 5 months ago

I feel it's better not to allow setting an output parameter's address to NULL. It's a rare case, and it slows down the API function (it has to check whether the pointer is NULL) and may bulk it up (with a DECREF, assuming the output value is produced anyways).

Also, I feel that upon error the output parameter should always be set to NULL, to avoid potentially uninitialized memory.

vstinner commented 4 months ago

I like this API: int PyIter_NextItem(PyObject *iter, PyObject **item) (I renamed next to item):

Set an exception, set *item to NULL, and return -1 on error.
Set *item to NULL, and return 0 on StopIteration.
Set *item to a strong reference to the iterator item, and return 1 on success.

iter and item cannot be NULL.

Also, I feel that upon error the output parameter should always be set to NULL, to avoid potentially uninitialized memory.

I agree 100% :-) We already did that in other recently added functions such as PyDict_GetItemRef().

vstinner commented 4 months ago

IMO it's worth it to propose a better API for PyIter_Next() which has an ambiguous return value (require to check for PyErr_Occurred()).

@erlend-aasland @gvanrossum @zooba: What do you think of the proposed API https://github.com/capi-workgroup/decisions/issues/21#issuecomment-2127484351 ? I understand that @encukou likes it.

Changing tp_next slot API is not doable. I prefer to leave it as it is.

zooba commented 4 months ago

I like it. It makes this pattern work nicely (due to both errors and valid results coercing to true):

PyObject *iter = PyObject_GetIter(o);
if (!iter) goto error;

PyObject *item = NULL;
while (PyIter_NextItem(iter, &item)) {
    if (!item) goto error;
    // work on item
    Py_DECREF(item);
}
Py_DECREF(iter);

erlend-aasland commented 4 months ago

What do you think of the proposed API https://github.com/capi-workgroup/decisions/issues/21#issuecomment-2127484351 ?

I like it.

Changing tp_next slot API is not doable. I prefer to leave it as it is.

Yeah; it is unfortunate, but IMO it is not a blocker for providing a better PyIter_Next API.

vstinner commented 3 months ago

Ok, let's do a formal vote on the API https://github.com/capi-workgroup/decisions/issues/21#issuecomment-2127484351:

[x] @encukou
[x] @erlend-aasland
[x] @vstinner
[x] @zooba
[x] @serhiy-storchaka
[x] @mdboom

vstinner commented 3 months ago

@serhiy-storchaka and @mdboom: You can now vote on this API. Just go ahead if you have any question.

vstinner commented 2 months ago

@serhiy-storchaka @mdboom: Are you able to check checkboxes to vote?

vstinner commented 2 months ago

@erlend-aasland: Congrats, your API was approved :-) You can go ahead and implement it! I close this issue.

erlend-aasland commented 2 months ago

Yay! It's not my API, though. I believe it was proposed by Irit or maybe Mark. I only insisted on the return value :) I'll dig up Irit's implementation and create a PR.

erlend-aasland commented 2 months ago

There is one issue we did not discuss: the ~~current~~ old API, PyIter_Next, requires the caller to check if iter is an iterator. I think we want the API to check this, set a TypeError and return -1 if iter is not an iterator.

encukou commented 2 months ago

Definitely.

picnixz commented 2 months ago

Would it be possible to introduce a private function which bypasses the check as well (_PyIter_NextItem) (just for CPython internal code just to reduce the number of isinstance() calls). I found those quite convenient when you care about performances in general but I'm not sure if it's the trend to do it in general.

encukou commented 2 months ago

It's not a full isinstance call, but a NULL check on tp->tp_iternext -- a function pointer we're just about to call. I doubt it would have a measurable impact.

(PyIter_Check also checks for _PyObject_NextNotImplemented, but PyIter_Next doesn't have to -- calling it will raise the TypeError.)

The super-optimized way to do this is to call tp->tp_iternext directly.

picnixz commented 2 months ago

It's not a full isinstance call, but a NULL check on tp->tp_iternext -- a function pointer we're just about to call. I doubt it would have a measurable impact.

Oh, then yes, it's probably not worth it (I thought it was something more intricate).

serhiy-storchaka commented 2 months ago

itertools is an example of such super-optimized code. It intentionally uses tp_iternext instead of PyIter_Next to avoid the overhead of the later.

erlend-aasland commented 1 month ago

itertools is an example of such super-optimized code. It intentionally uses tp_iternext instead of PyIter_Next to avoid the overhead of the later.

It's good that we don't see this in the rest of the standard library; if all extension modules were written like that, we'd have a maintenance nightmare. IMO, this is a good reason to introduce an internal _PyIter_NextItem without the type checking and prune some verbose boiler-plate code in itertools.

erlend-aasland commented 1 month ago

I created a PR:

https://github.com/python/cpython/pull/122331

I plan on merging it early next week.

capi-workgroup / decisions