Open erno98 opened 4 years ago
I have the same problem with this and all other non-API tweet scrapers at the moment. You can collect about 14,000 tweets before hitting the request limit.
Same problem here, do you happen to know after how much time that number resets? @rbkhb
Haven't figured that out, no
I have the same problem and can confirm the 14000 tweets limit. I was able to retry after a couple of minutes (5 or less) need to check the exact time.
I found a solution, not ideal but it works, maybe you can help me make it better:
# Date to start from
date_upper = datetime.datetime(2020, 3, 1)
date_lower = datetime.datetime(2020, 2, 29)
date_until = date_upper
date_start = date_lower
start_string = date_start.strftime("%Y-%m-%d")
until_string = date_until.strftime("%Y-%m-%d")
for i in range(29):
# Create a custom search term and define the number of tweets
tweetCriteria = got.manager.TweetCriteria().setQuerySearch(
'Coronavirus').setSince(start_string).setUntil(until_string).setLang('it').setMaxTweets(count)
# Call getTweets and saving in tweets
print('--- Starting query... ---')
tweets = got.manager.TweetManager.getTweets(tweetCriteria)
print('--- Adding to list... ---')
add_to_list()
print('--- Writing JSON... ---')
# Saving list to JSON file
json.dump(tweet_list, open('./JSON/saver_output.json', 'w'))
print('--- Going to sleep... ---\n\n')
time.sleep(60*5)
# Add 1 to date after each passage
date_start += datetime.timedelta(days=1)
date_until += datetime.timedelta(days=1)
# Convert dates to string
start_string = date_start.strftime("%Y-%m-%d")
until_string = date_until.strftime("%Y-%m-%d")
Doing like so i was able to retrieve almost 120k tweets in a night sleep without any hiccups, i know the code could be much shorter but i wrote it just before going to bed.
I found a solution, not ideal but it works, maybe you can help me make it better:
# Date to start from date_upper = datetime.datetime(2020, 3, 1) date_lower = datetime.datetime(2020, 2, 29) date_until = date_upper date_start = date_lower start_string = date_start.strftime("%Y-%m-%d") until_string = date_until.strftime("%Y-%m-%d") for i in range(29): # Create a custom search term and define the number of tweets tweetCriteria = got.manager.TweetCriteria().setQuerySearch( 'Coronavirus').setSince(start_string).setUntil(until_string).setLang('it').setMaxTweets(count) # Call getTweets and saving in tweets print('--- Starting query... ---') tweets = got.manager.TweetManager.getTweets(tweetCriteria) print('--- Adding to list... ---') add_to_list() print('--- Writing JSON... ---') # Saving list to JSON file json.dump(tweet_list, open('./JSON/saver_output.json', 'w')) print('--- Going to sleep... ---\n\n') time.sleep(60*5) # Add 1 to date after each passage date_start += datetime.timedelta(days=1) date_until += datetime.timedelta(days=1) # Convert dates to string start_string = date_start.strftime("%Y-%m-%d") until_string = date_until.strftime("%Y-%m-%d")
Doing like so i was able to retrieve almost 120k tweets in a night sleep without any hiccups, i know the code could be much shorter but i wrote it just before going to bed.
Hi, I think you have search for coronavirus and u have datam can u please send to me Thanks in advance ajithex@gmail.com
I found a solution, not ideal but it works, maybe you can help me make it better:
# Date to start from date_upper = datetime.datetime(2020, 3, 1) date_lower = datetime.datetime(2020, 2, 29) date_until = date_upper date_start = date_lower start_string = date_start.strftime("%Y-%m-%d") until_string = date_until.strftime("%Y-%m-%d") for i in range(29): # Create a custom search term and define the number of tweets tweetCriteria = got.manager.TweetCriteria().setQuerySearch( 'Coronavirus').setSince(start_string).setUntil(until_string).setLang('it').setMaxTweets(count) # Call getTweets and saving in tweets print('--- Starting query... ---') tweets = got.manager.TweetManager.getTweets(tweetCriteria) print('--- Adding to list... ---') add_to_list() print('--- Writing JSON... ---') # Saving list to JSON file json.dump(tweet_list, open('./JSON/saver_output.json', 'w')) print('--- Going to sleep... ---\n\n') time.sleep(60*5) # Add 1 to date after each passage date_start += datetime.timedelta(days=1) date_until += datetime.timedelta(days=1) # Convert dates to string start_string = date_start.strftime("%Y-%m-%d") until_string = date_until.strftime("%Y-%m-%d")
Doing like so i was able to retrieve almost 120k tweets in a night sleep without any hiccups, i know the code could be much shorter but i wrote it just before going to bed.
Hi, I think you have search for coronavirus and u have datam can u please send to me Thanks in advance ajithex@gmail.com
Hi,
I think I find a solution to get more than 14000 tweets per day with a small change in the package themself. You only have to install a sleeping time after 14000 tweets. In combination with a loop over the dates and rotation over proxy, this works for me very well.
Hey @p-dre, that's a nice solution. However, I've encountered another problem - what if given query search, on one day, exceeds the 14k limit?
I found a solution, not ideal but it works, maybe you can help me make it better:
# Date to start from date_upper = datetime.datetime(2020, 3, 1) date_lower = datetime.datetime(2020, 2, 29) date_until = date_upper date_start = date_lower start_string = date_start.strftime("%Y-%m-%d") until_string = date_until.strftime("%Y-%m-%d") for i in range(29): # Create a custom search term and define the number of tweets tweetCriteria = got.manager.TweetCriteria().setQuerySearch( 'Coronavirus').setSince(start_string).setUntil(until_string).setLang('it').setMaxTweets(count) # Call getTweets and saving in tweets print('--- Starting query... ---') tweets = got.manager.TweetManager.getTweets(tweetCriteria) print('--- Adding to list... ---') add_to_list() print('--- Writing JSON... ---') # Saving list to JSON file json.dump(tweet_list, open('./JSON/saver_output.json', 'w')) print('--- Going to sleep... ---\n\n') time.sleep(60*5) # Add 1 to date after each passage date_start += datetime.timedelta(days=1) date_until += datetime.timedelta(days=1) # Convert dates to string start_string = date_start.strftime("%Y-%m-%d") until_string = date_until.strftime("%Y-%m-%d")
Doing like so i was able to retrieve almost 120k tweets in a night sleep without any hiccups, i know the code could be much shorter but i wrote it just before going to bed.
Hi, I think you have search for coronavirus and u have datam can u please send to me Thanks in advance ajithex@gmail.com
Hi,
I think I find a solution to get more than 14000 tweets per day with a small change in the package themself. You only have to install a sleeping time after 14000 tweets. In combination with a loop over the dates and rotation over proxy, this works for me very well.
Because I am going to write my masterthesis about coronavirus with Twitter data, I am interested to know what your plan is. So maybe contact me paul.drecker@stud.uni-due.de
Can u please share how you uses proxies and which proxy provider.
@erno98 If you inside the package you will find a loop over the batches. I at a sleep time after 14000 tweets
I found a solution, not ideal but it works, maybe you can help me make it better:
# Date to start from date_upper = datetime.datetime(2020, 3, 1) date_lower = datetime.datetime(2020, 2, 29) date_until = date_upper date_start = date_lower start_string = date_start.strftime("%Y-%m-%d") until_string = date_until.strftime("%Y-%m-%d") for i in range(29): # Create a custom search term and define the number of tweets tweetCriteria = got.manager.TweetCriteria().setQuerySearch( 'Coronavirus').setSince(start_string).setUntil(until_string).setLang('it').setMaxTweets(count) # Call getTweets and saving in tweets print('--- Starting query... ---') tweets = got.manager.TweetManager.getTweets(tweetCriteria) print('--- Adding to list... ---') add_to_list() print('--- Writing JSON... ---') # Saving list to JSON file json.dump(tweet_list, open('./JSON/saver_output.json', 'w')) print('--- Going to sleep... ---\n\n') time.sleep(60*5) # Add 1 to date after each passage date_start += datetime.timedelta(days=1) date_until += datetime.timedelta(days=1) # Convert dates to string start_string = date_start.strftime("%Y-%m-%d") until_string = date_until.strftime("%Y-%m-%d")
Doing like so i was able to retrieve almost 120k tweets in a night sleep without any hiccups, i know the code could be much shorter but i wrote it just before going to bed.
Hi, I think you have search for coronavirus and u have datam can u please send to me Thanks in advance ajithex@gmail.com
Hi,
I think I find a solution to get more than 14000 tweets per day with a small change in the package themself. You only have to install a sleeping time after 14000 tweets. In combination with a loop over the dates and rotation over proxy, this works for me very well.
Hi, could you share your code with me since I really want to know how to set up sleep time after 14000 tweets. I have just started programming, many thanks!
I found a solution, not ideal but it works, maybe you can help me make it better:
# Date to start from date_upper = datetime.datetime(2020, 3, 1) date_lower = datetime.datetime(2020, 2, 29) date_until = date_upper date_start = date_lower start_string = date_start.strftime("%Y-%m-%d") until_string = date_until.strftime("%Y-%m-%d") for i in range(29): # Create a custom search term and define the number of tweets tweetCriteria = got.manager.TweetCriteria().setQuerySearch( 'Coronavirus').setSince(start_string).setUntil(until_string).setLang('it').setMaxTweets(count) # Call getTweets and saving in tweets print('--- Starting query... ---') tweets = got.manager.TweetManager.getTweets(tweetCriteria) print('--- Adding to list... ---') add_to_list() print('--- Writing JSON... ---') # Saving list to JSON file json.dump(tweet_list, open('./JSON/saver_output.json', 'w')) print('--- Going to sleep... ---\n\n') time.sleep(60*5) # Add 1 to date after each passage date_start += datetime.timedelta(days=1) date_until += datetime.timedelta(days=1) # Convert dates to string start_string = date_start.strftime("%Y-%m-%d") until_string = date_until.strftime("%Y-%m-%d")
Doing like so i was able to retrieve almost 120k tweets in a night sleep without any hiccups, i know the code could be much shorter but i wrote it just before going to bed.
Hi, Could you share the program with me via luyang1210@gmail.com? I have come across the same issue and really want to solve it. Many thanks!
I have a long list of keywords (around 700). I want to fetch all of them since February, without any other criterias. Now, I immediately get struck with "An error occured during an HTTP request: HTTP Error 429: Too Many Requests", and when I open given in link browser, everything works fine. I tried to fetch for 1 day periods only (for example 01-02-2020 to 02-02-2020, etc.), but it still doesn't work (because of the same error). Any ideas how to solve it? I tried to sleep the script after such error, but even an hour of waiting doesn't seem to affect it in any way.
After some waiting, the script runs for around 10% of the tweets, and gets the error again.