cedoard / snscrape_twitter

Using snscrape and tweepy libraries to scrape unlimited amount of tweets
26 stars 16 forks source link

Exception 'Unable to find guest token' raised when scraping replies approximately 33% of the time #4

Open DnVS97 opened 3 years ago

DnVS97 commented 3 years ago

Hi,

Thanks for providing this github repo, it has been very useful. I'm using the "collect_tweet_replies" function with as input a list of 15.000 tweetIDs. As stated in the title, about a third of the IDs do not get processed in the expected way where the replies to the tweet get retrieved. The exception that is being raised is "'Unable to find guest token" and I have not been able to figure out why this exception is raised and why it is happening to a subset of the tweets.

The "unable to find guest token" exception has been raised before on the Snscrape github issue page: https://github.com/JustAnotherArchivist/snscrape/issues/79 https://github.com/JustAnotherArchivist/snscrape/issues/110 In the above issues the problem was raised due to using Google Colab notebooks or AWS to perform the scraping and these domains can be blocked by Twitter. However, I'm running the script locally. I'm wondering if this behavior is known, when it occurs and if there is a possible fix.

Thanks in advance.

darrenlimweiyang commented 3 years ago

Hey,

I'm encountering a similar problem. Gut is a internal request limit is hit after x records.

tsangvin commented 3 years ago

Hey,

I'm encountering a similar problem. Gut is a internal request limit is hit after x records.

I'm encountering this on Jupyter Notebook python 3.8. I am still determining how often this occurring. I have been running the package for non-stop for a day or two. The trend I notice was that once the exception is hit, a certain period after would have same repeated exceptions. However, after some time (still figuring out how long exactly), it resumes.

Remark: I have been scraping tweets for particular constituents of index across a calendar year. I usually got a day of tweets for most of the constituents, hitting exception, and 5-6 days of blank results. After that, I got another day of tweets for most of the text queries.

I am attempting to pause the program for a defined amount of time once this exception is met, attempting to improve the data completeness of the scraped tweets.

Anyone has other idea on the workaround?