capi-workgroup / decisions

Discussion and voting on specific issues
5 stars 1 forks source link

Introduce replacement API for `PyIter_Next` with a non-ambiguous return value #21

Closed erlend-aasland closed 2 months ago

erlend-aasland commented 5 months ago

Previous discussion:

I would like to pursue this issue. IMO, PyIter_Next having an ambiguous return value is a good enough motivation for adding a replacement API. If you disagree, feel free to just close this issue. AFAICS, three APIs have been discussed; I'm in favour of Mark's proposal (item 3):

1: PyObject *PyIter_NextItem(PyObject* iter, int *err)

Return values: a NULL pointer (loop termination or error) or a PyObject pointer (item returned) Mentioned in https://github.com/python/cpython/issues/105201#issuecomment-1574044071

Example usage ```c PyObject *item; int err; while (item = PyIter_NextItem(iterator, &err)) { /* do something with item */ ... /* release reference when done */ Py_DECREF(item); } Py_DECREF(iterator); if (err < 0) { /* error */ } else { /* no error */ } ```

2: int PyIter_NextItem(PyObject* iter, PyObject **item)

Return values: -1 (error) or 0 (item returned or loop termination) Mentioned in https://github.com/python/cpython/issues/105201#issuecomment-1574044071

Example usage ```c PyObject *item; int err; while ((err = PyIter_NextItem(iterator, &item)) == 0 && *item != NULL) { /* do something with item */ ... /* release reference when done */ Py_DECREF(item); } Py_DECREF(iterator); if (err < 0) { /* error */ } else { /* no error */ } ```

3: int PyIter_NextItem(PyObject *iter, PyObject **next)

Return values: -1 (error), 0 (loop terminated), or 1 (item returned) Mentioned in https://github.com/python/cpython/issues/105201#issuecomment-1578342058

Example usage ```c PyObject *next; int rc; while ((rc = PyIter_NextItem(iterator, &next)) > 0) { use(next); Py_DECREF(next); } if (rc < 0) { /* cleanup */ return -1; } /* done */ ```
gvanrossum commented 5 months ago

In this case, isn't it a complicating factor that the underlying tp_ slot, tp_next, has the same ambiguous API? And that's not so easy to change. So this would be mostly cosmetic, right?

erlend-aasland commented 5 months ago

In this case, isn't it a complicating factor that the underlying tp_ slot, tp_next, has the same ambiguous API? And that's not so easy to change. So this would be mostly cosmetic, right?

I'll quote @markshannon in https://github.com/capi-workgroup/problems/issues/1#issuecomment-1604074694:

  • Why is it bad? It is global state, with all the problems that entails
  • Is it slow? Yes. It forces us to save and restore it across calls, and to perform expensive checks after every call into C extension code.
  • Does it prevent optimizations? Yes, for both of the above reasons.
erlend-aasland commented 5 months ago

The current API (PyIter_Next) requires you to call PyErr_Occurred ("global state") in order to find out if an error happened.

gvanrossum commented 5 months ago

The current API (PyIter_Next) requires you to call PyErr_Occurred ("global state") in order to find out if an error happened.

I get that. But changing PyIter_Next to have a less ambiguous signature just moves the problem -- it still has to call tp_next which itself has a similar ambiguity. So I don't think it addresses any of the three bullets from Mark that you quoted.

Well, it saves one call to PyErr_Occurred(), so it should be a little bit faster. But it's still a call to PyErr_Occurred(), which still has all the disadvantages Mark mentions.

erlend-aasland commented 5 months ago

I get that. But changing PyIter_Next to have a less ambiguous signature just moves the problem -- it still has to call tp_next which itself has a similar ambiguity. So I don't think it addresses any of the three bullets from Mark that you quoted.

Ah, yes that is true. I guess it boils down to a cosmetic issue, for now. I'll let the WG decide if a cleaner API is worth it.

gvanrossum commented 5 months ago

My personal view is that we have bigger fish to fry.

encukou commented 5 months ago

+1 for the third one, it matches where we've been going so far and I don't see a reason to change course. But there are some details that make this fish not worth frying right now, like:

erlend-aasland commented 5 months ago
  • on -1/0 return, should *next be set to NULL or left as it was?

FWIW: yes, that would make for a saner public API.

  • can next itself be NULL (to advance the iterator without getting the value)?

What's the use-case? I could not find such a use-case in the code base.

encukou commented 5 months ago

What's the use-case?

Consistency in whether NULL can be passed to an output parameter.

(Deciding whether we want that -- and how to treat output parameters in general -- is one of the proverbial bigger fish to fry.)

gvanrossum commented 5 months ago

I feel it's better not to allow setting an output parameter's address to NULL. It's a rare case, and it slows down the API function (it has to check whether the pointer is NULL) and may bulk it up (with a DECREF, assuming the output value is produced anyways).

Also, I feel that upon error the output parameter should always be set to NULL, to avoid potentially uninitialized memory.

vstinner commented 4 months ago

I like this API: int PyIter_NextItem(PyObject *iter, PyObject **item) (I renamed next to item):

iter and item cannot be NULL.

Also, I feel that upon error the output parameter should always be set to NULL, to avoid potentially uninitialized memory.

I agree 100% :-) We already did that in other recently added functions such as PyDict_GetItemRef().

vstinner commented 4 months ago

IMO it's worth it to propose a better API for PyIter_Next() which has an ambiguous return value (require to check for PyErr_Occurred()).

@erlend-aasland @gvanrossum @zooba: What do you think of the proposed API https://github.com/capi-workgroup/decisions/issues/21#issuecomment-2127484351 ? I understand that @encukou likes it.

Changing tp_next slot API is not doable. I prefer to leave it as it is.

zooba commented 4 months ago

I like it. It makes this pattern work nicely (due to both errors and valid results coercing to true):

PyObject *iter = PyObject_GetIter(o);
if (!iter) goto error;

PyObject *item = NULL;
while (PyIter_NextItem(iter, &item)) {
    if (!item) goto error;
    // work on item
    Py_DECREF(item);
}
Py_DECREF(iter);
erlend-aasland commented 4 months ago

What do you think of the proposed API https://github.com/capi-workgroup/decisions/issues/21#issuecomment-2127484351 ?

I like it.

Changing tp_next slot API is not doable. I prefer to leave it as it is.

Yeah; it is unfortunate, but IMO it is not a blocker for providing a better PyIter_Next API.

vstinner commented 3 months ago

Ok, let's do a formal vote on the API https://github.com/capi-workgroup/decisions/issues/21#issuecomment-2127484351:

vstinner commented 3 months ago

See also: https://github.com/capi-workgroup/problems/issues/29

vstinner commented 3 months ago

@serhiy-storchaka and @mdboom: You can now vote on this API. Just go ahead if you have any question.

vstinner commented 2 months ago

@serhiy-storchaka @mdboom: Are you able to check checkboxes to vote?

vstinner commented 2 months ago

@erlend-aasland: Congrats, your API was approved :-) You can go ahead and implement it! I close this issue.

erlend-aasland commented 2 months ago

Yay! It's not my API, though. I believe it was proposed by Irit or maybe Mark. I only insisted on the return value :) I'll dig up Irit's implementation and create a PR.

erlend-aasland commented 2 months ago

There is one issue we did not discuss: the current old API, PyIter_Next, requires the caller to check if iter is an iterator. I think we want the API to check this, set a TypeError and return -1 if iter is not an iterator.

encukou commented 2 months ago

Definitely.

picnixz commented 2 months ago

Would it be possible to introduce a private function which bypasses the check as well (_PyIter_NextItem) (just for CPython internal code just to reduce the number of isinstance() calls). I found those quite convenient when you care about performances in general but I'm not sure if it's the trend to do it in general.

encukou commented 2 months ago

It's not a full isinstance call, but a NULL check on tp->tp_iternext -- a function pointer we're just about to call. I doubt it would have a measurable impact.

(PyIter_Check also checks for _PyObject_NextNotImplemented, but PyIter_Next doesn't have to -- calling it will raise the TypeError.)

The super-optimized way to do this is to call tp->tp_iternext directly.

picnixz commented 2 months ago

It's not a full isinstance call, but a NULL check on tp->tp_iternext -- a function pointer we're just about to call. I doubt it would have a measurable impact.

Oh, then yes, it's probably not worth it (I thought it was something more intricate).

serhiy-storchaka commented 2 months ago

itertools is an example of such super-optimized code. It intentionally uses tp_iternext instead of PyIter_Next to avoid the overhead of the later.

erlend-aasland commented 1 month ago

itertools is an example of such super-optimized code. It intentionally uses tp_iternext instead of PyIter_Next to avoid the overhead of the later.

It's good that we don't see this in the rest of the standard library; if all extension modules were written like that, we'd have a maintenance nightmare. IMO, this is a good reason to introduce an internal _PyIter_NextItem without the type checking and prune some verbose boiler-plate code in itertools.

erlend-aasland commented 1 month ago

I created a PR:

I plan on merging it early next week.