MichaelCurrin / python-twitter-guide

Code snippets and links to docs around using the Twitter API and Tweepy 🐍 🐦
https://michaelcurrin.github.io/python-twitter-guide/
MIT License
4 stars 8 forks source link

Twitter replies repeat #30

Closed nadecancode closed 4 years ago

nadecancode commented 4 years ago

As described in the discord server, the replies are repeating whenever I tried to fetch replies.

https://easyupload.io/wclgdj -> The dump file

TweetScrapperConversation(
        username=response['user']['screen_name'],
        parent_tweet_id=int(response['id_str']),
        num_tweets=100,
        tweet_dump_path='twitter_conv.csv',
        tweet_dump_format='csv'
    ).get_thread_tweets(False)

    index = 1
    with open('twitter_conv.csv', 'r', encoding="utf-8") as responseCsvFile:
        reader = csv.DictReader(responseCsvFile)
        commentsResponse = list(reader)

        for comment in commentsResponse: //The iteration loop, doesn't really matter to show
nadecancode commented 4 years ago

Ok, the issue is maybe that I specified num_tweets, but the actual comments are way less than this amount. and that's why the comments array was duplicating.

nadecancode commented 4 years ago

        commentsResponse = []

        seen = set()
        for d in commentsResponse0:
            t = tuple(d.items())
            if t not in seen:
                seen.add(t)
                commentsResponse.append(d)```

This's probably the temporary fix. I filtered out all of the duplicate dicts to make sure no comments are repeating.
MichaelCurrin commented 4 years ago

Thanks for the info.

You can use tweet.id as an int without bothering about id_str

Why don't you try change your False to True? I thought it would write to file on False.

I am glad you found a work around. If you think you've found a bug then create a bug issue on the tweet_scrapper github repo so the maintainer there can fix or advise.