bslatkin / effectivepython

Effective Python: Second Edition — Source Code and Errata for the Book
https://effectivepython.com
2.24k stars 718 forks source link

Inappropriate use of "iterator" in Item 7. #62

Closed JirenJin closed 4 months ago

JirenJin commented 4 years ago

"enumerate wraps any iterator with a lazy generator". I think "iterable" is more presice here than "iterator".

And "iterable" is more consistent with the definition of enumerate fucntion in the official doc:

enumerate(iterable, start=0)

iterable must be a sequence, an iterator, or some other object which supports iteration.

Same for:

bslatkin commented 4 years ago

I think most Python programmers don't use the word "iterable" when they talk about these types of objects. Practically, an "iterable container" or an "iterator instance" can be used interchangeably in most cases (with it = iter(obj) followed by next(it)) so you don't even need to know the difference. Similarly, people don't say "sequences" when they're talking about indexable objects with length, they call them "lists" (even when they aren't).

Looking at the code, the happy path here is when the object passed into enumerate is already an iterator:

https://github.com/python/cpython/blob/530f506ac91338b55cf2be71b1cdf50cb077512f/Objects/enumobject.c#L39

static PyObject *
enum_new_impl(PyTypeObject *type, PyObject *iterable, PyObject *start)
/*[clinic end generated code: output=e95e6e439f812c10 input=782e4911efcb8acf]*/
{
    enumobject *en;

    en = (enumobject *)type->tp_alloc(type, 0);
    if (en == NULL)
        return NULL;
    if (start != NULL) {
        start = PyNumber_Index(start);
        if (start == NULL) {
            Py_DECREF(en);
            return NULL;
        }
        assert(PyLong_Check(start));
        en->en_index = PyLong_AsSsize_t(start);
        if (en->en_index == -1 && PyErr_Occurred()) {
            PyErr_Clear();
            en->en_index = PY_SSIZE_T_MAX;
            en->en_longindex = start;
        } else {
            en->en_longindex = NULL;
            Py_DECREF(start);
        }
    } else {
        en->en_index = 0;
        en->en_longindex = NULL;
    }
    en->en_sit = PyObject_GetIter(iterable);
    if (en->en_sit == NULL) {
        Py_DECREF(en);
        return NULL;
    }
    en->en_result = PyTuple_Pack(2, Py_None, Py_None);
    if (en->en_result == NULL) {
        Py_DECREF(en);
        return NULL;
    }
    return (PyObject *)en;
}

https://github.com/python/cpython/blob/be434dc0380d9f5c7c800de9943cc46d55fd9491/Objects/abstract.c#L2625

PyObject *
PyObject_GetIter(PyObject *o)
{
    PyTypeObject *t = o->ob_type;
    getiterfunc f;

    f = t->tp_iter;
    if (f == NULL) {
        if (PySequence_Check(o))
            return PySeqIter_New(o);
        return type_error("'%.200s' object is not iterable", o);
    }
    else {
        PyObject *res = (*f)(o);
        if (res != NULL && !PyIter_Check(res)) {
            PyErr_Format(PyExc_TypeError,
                         "iter() returned non-iterator "
                         "of type '%.100s'",
                         res->ob_type->tp_name);
            Py_DECREF(res);
            res = NULL;
        }
        return res;
    }
}

Another data point is the collections.abc.Iterable type, where the documentation says (my emphasis added):

Checking isinstance(obj, Iterable) detects classes that are registered as Iterable or that have an __iter__() method, but it does not detect classes that iterate with the __getitem__() method. The only reliable way to determine whether an object is iterable is to call iter(obj).

So I'm not sure what I want to do here. I'd rather keep the language in the book plain and practical, and say list and iterator instead of sequence and iterable. I need to see what I've done consistently elsewhere in the book to decide what to do.