geduldig / TwitterAPI

Minimal python wrapper for Twitter's REST and Streaming APIs
939 stars 263 forks source link

RestIterator object is not an iterator #28

Closed tangentmonger closed 10 years ago

tangentmonger commented 10 years ago
raw_tweet = next(raw_tweets.get_iterator())

fails with "TypeError: RestIterator object is not an iterator". I believe this is because RestIterator doesn't implement a next() method. StreamingIterator is also missing next().

geduldig commented 10 years ago

My understanding is that you are right, RestIterator and StreamingIterator are not iterators. However, because __iter__() methods that use yield also automatically create a next() method, RestIterator and StreamIterator behave like iterators. Maybe "behave like iterables" is more accurate.

Either way, if you want to call next() directly on the thing that get_iterator() returns, you would do this:

    it = iter(raw_tweets.get_iterator())
    raw_tweet = next(it)

I'm a little confused myself if this is bad design, bad naming, or neither.

tangentmonger commented 10 years ago

According to http://stackoverflow.com/questions/19151/build-a-basic-python-iterator, there are two different ways to write iterators:

1) write a class that supports the iterator protocol, i.e. it has init() and next() methods (or init() and next() for Python 3). Those are regular methods, not generators.

2) just write a generator instead of a class, because generators already support the iterator protocol.

Since RestIterator and StreamingIterator need to be full classes, the generator shortcut isn't useful here.

I've suggested some changes so that RestIterator and StreamingIterator are full iterators (Python 2 and 3 compatible) and TwitterResponse is an iterable. As far as I can tell, this won't break anything but will allow you to write:

it = raw_tweets.get_iterator() raw_tweet = next(it)

or even just

it = iter(my_twitter_response) raw_tweet = next(it)

which I think is more standard.

On 20/09/14 19:14, geduldig wrote:

My understanding is that you are right, RestIterator and StreamingIterator are not iterators. However, because iter() methods that use /yield/ also automatically create a next() method, RestIterator and StreamIterator behave like iterators. Maybe "behave like iterables" is more accurate.

Either way, if you want to call next() directly on the thing that get_iterator() returns, you would do this: it = iter(raw_tweets.get_iterator()) raw_tweet = next(it)

I'm a little confused myself if this is bad design, bad naming, or neither.

— Reply to this email directly or view it on GitHub https://github.com/geduldig/TwitterAPI/issues/28#issuecomment-56275706.

geduldig commented 10 years ago

I looked over your changes and did some initial tests. Looks good! I like that you were able to keep them compatible with both Python 2 and 3. Before adding them, I have a couple of reservations I want to run by you. It looks like these changes would result in a cleaner syntax for the user. So, instead of writing

it = iter(raw_tweets.get_iterator())
next(it)

your changes would allow

it = raw_tweets.get_iterator()
next(it)

The other thing your changes address is the current classes do not adhere to standard Python iterator protocol.

My main reservation is that these changes add a bunch more code without fixing or improving functionality. On the other hand, the changes align better with Python standards, and that probably should trump my reservation.

What do you think of the following alternative?

First, rename RestIterator and StreamingIterator to RestGenerator and StreamingGenerator, since they are generators and not iterators, as the TypeError makes clear. The classes no longer should be expected to adhere to iterator protocol. They are generators that behave as iterators, as described in this Python doc.

Second, change method TwitterResponse.get_iterator() to return a true iterator:

def get_iterator(self):
    """:returns: Iterator of items in the response."""
    if self.stream:
        return StreamingGenerator(self.response).__iter__()
    else:
        return RestGenerator(self.response).__iter__()

Now, there is no TypeError, and the method name does not mislead.

One issue with this alternate solution is the class renaming if someone has code that references RestIterator or StreamingIterator. I, personally, never have, but I can't rule out that others have. I need to do more testing before pushing any changes, but either way I need to address this issue.

Thanks!

tangentmonger commented 10 years ago

You're right, it's a lot of new code - mostly due to compatibility issues. Your alternative neatly avoids that whole issue :)

I'm offering an alternative patch with renamed classes as you suggest (although I'm not sure if they shoudl be named RestGenerator, RestIterable, or maybe just RestTweets), and some code to allow use of the old names with a deprecation warning.

I also updated the readme, but having done so, I started to think that 'for item in r' is not so clear when r (a TwitterResponse object) contains more than just the tweets. Maybe 'for item in r.tweets' would be an improvement.

geduldig commented 10 years ago

I think I am going to merge your patch as is. Thank you for thinking this through with me!

I feel tempted to remove the backward compatibility functions because I just can't see why someone would use RestIterator and StreamingIterator in their code -- but not right away. The new class names probably should get privatized as _RestIterable and _StreamingIterable.

Regarding replacing the "r" in "for item in r" with "r.tweets", that change would not be quite right. Besides tweets, REST API responses also can include "error" objects. Streaming API responses can include "limit" objects and a few other types of objects.

geduldig commented 10 years ago

Merged in version 2.2.4

tangentmonger commented 10 years ago

First time contributing code to an OSS project. Thanks for making it a pleasant experience!