Jefferson-Henrique / GetOldTweets-python

A project written in Python to get old tweets, it bypass some limitations of Twitter Official API.
MIT License
1.35k stars 809 forks source link

HTTP Error 429 when fetching tweets #262

Open erno98 opened 4 years ago

erno98 commented 4 years ago

I have a long list of keywords (around 700). I want to fetch all of them since February, without any other criterias. Now, I immediately get struck with "An error occured during an HTTP request: HTTP Error 429: Too Many Requests", and when I open given in link browser, everything works fine. I tried to fetch for 1 day periods only (for example 01-02-2020 to 02-02-2020, etc.), but it still doesn't work (because of the same error). Any ideas how to solve it? I tried to sleep the script after such error, but even an hour of waiting doesn't seem to affect it in any way.

After some waiting, the script runs for around 10% of the tweets, and gets the error again.

rbkhb commented 4 years ago

I have the same problem with this and all other non-API tweet scrapers at the moment. You can collect about 14,000 tweets before hitting the request limit.

tiaringhio commented 4 years ago

Same problem here, do you happen to know after how much time that number resets? @rbkhb

rbkhb commented 4 years ago

Haven't figured that out, no

valentin-pecten commented 4 years ago

I have the same problem and can confirm the 14000 tweets limit. I was able to retry after a couple of minutes (5 or less) need to check the exact time.

tiaringhio commented 4 years ago

I found a solution, not ideal but it works, maybe you can help me make it better:

# Date to start from
date_upper = datetime.datetime(2020, 3, 1)
date_lower = datetime.datetime(2020, 2, 29)

date_until = date_upper
date_start = date_lower

start_string = date_start.strftime("%Y-%m-%d")
until_string = date_until.strftime("%Y-%m-%d")

for i in range(29):
    # Create a custom search term and define the number of tweets
    tweetCriteria = got.manager.TweetCriteria().setQuerySearch(
        'Coronavirus').setSince(start_string).setUntil(until_string).setLang('it').setMaxTweets(count)
    # Call getTweets and saving in tweets
    print('--- Starting query... ---')
    tweets = got.manager.TweetManager.getTweets(tweetCriteria)
    print('--- Adding to list... ---')
    add_to_list()
    print('--- Writing JSON... ---')
    # Saving list to JSON file
    json.dump(tweet_list, open('./JSON/saver_output.json', 'w'))
    print('--- Going to sleep... ---\n\n')
    time.sleep(60*5)
    # Add 1 to date after each passage
    date_start += datetime.timedelta(days=1)
    date_until += datetime.timedelta(days=1)
    # Convert dates to string
    start_string = date_start.strftime("%Y-%m-%d")
    until_string = date_until.strftime("%Y-%m-%d")

Doing like so i was able to retrieve almost 120k tweets in a night sleep without any hiccups, i know the code could be much shorter but i wrote it just before going to bed.

ajithex commented 4 years ago

I found a solution, not ideal but it works, maybe you can help me make it better:

# Date to start from
date_upper = datetime.datetime(2020, 3, 1)
date_lower = datetime.datetime(2020, 2, 29)

date_until = date_upper
date_start = date_lower

start_string = date_start.strftime("%Y-%m-%d")
until_string = date_until.strftime("%Y-%m-%d")

for i in range(29):
    # Create a custom search term and define the number of tweets
    tweetCriteria = got.manager.TweetCriteria().setQuerySearch(
        'Coronavirus').setSince(start_string).setUntil(until_string).setLang('it').setMaxTweets(count)
    # Call getTweets and saving in tweets
    print('--- Starting query... ---')
    tweets = got.manager.TweetManager.getTweets(tweetCriteria)
    print('--- Adding to list... ---')
    add_to_list()
    print('--- Writing JSON... ---')
    # Saving list to JSON file
    json.dump(tweet_list, open('./JSON/saver_output.json', 'w'))
    print('--- Going to sleep... ---\n\n')
    time.sleep(60*5)
    # Add 1 to date after each passage
    date_start += datetime.timedelta(days=1)
    date_until += datetime.timedelta(days=1)
    # Convert dates to string
    start_string = date_start.strftime("%Y-%m-%d")
    until_string = date_until.strftime("%Y-%m-%d")

Doing like so i was able to retrieve almost 120k tweets in a night sleep without any hiccups, i know the code could be much shorter but i wrote it just before going to bed.

Hi, I think you have search for coronavirus and u have datam can u please send to me Thanks in advance ajithex@gmail.com

p-dre commented 4 years ago

I found a solution, not ideal but it works, maybe you can help me make it better:

# Date to start from
date_upper = datetime.datetime(2020, 3, 1)
date_lower = datetime.datetime(2020, 2, 29)

date_until = date_upper
date_start = date_lower

start_string = date_start.strftime("%Y-%m-%d")
until_string = date_until.strftime("%Y-%m-%d")

for i in range(29):
    # Create a custom search term and define the number of tweets
    tweetCriteria = got.manager.TweetCriteria().setQuerySearch(
        'Coronavirus').setSince(start_string).setUntil(until_string).setLang('it').setMaxTweets(count)
    # Call getTweets and saving in tweets
    print('--- Starting query... ---')
    tweets = got.manager.TweetManager.getTweets(tweetCriteria)
    print('--- Adding to list... ---')
    add_to_list()
    print('--- Writing JSON... ---')
    # Saving list to JSON file
    json.dump(tweet_list, open('./JSON/saver_output.json', 'w'))
    print('--- Going to sleep... ---\n\n')
    time.sleep(60*5)
    # Add 1 to date after each passage
    date_start += datetime.timedelta(days=1)
    date_until += datetime.timedelta(days=1)
    # Convert dates to string
    start_string = date_start.strftime("%Y-%m-%d")
    until_string = date_until.strftime("%Y-%m-%d")

Doing like so i was able to retrieve almost 120k tweets in a night sleep without any hiccups, i know the code could be much shorter but i wrote it just before going to bed.

Hi, I think you have search for coronavirus and u have datam can u please send to me Thanks in advance ajithex@gmail.com

Hi,

I think I find a solution to get more than 14000 tweets per day with a small change in the package themself. You only have to install a sleeping time after 14000 tweets. In combination with a loop over the dates and rotation over proxy, this works for me very well.

erno98 commented 4 years ago

Hey @p-dre, that's a nice solution. However, I've encountered another problem - what if given query search, on one day, exceeds the 14k limit?

asif-faizan commented 4 years ago

I found a solution, not ideal but it works, maybe you can help me make it better:

# Date to start from
date_upper = datetime.datetime(2020, 3, 1)
date_lower = datetime.datetime(2020, 2, 29)

date_until = date_upper
date_start = date_lower

start_string = date_start.strftime("%Y-%m-%d")
until_string = date_until.strftime("%Y-%m-%d")

for i in range(29):
    # Create a custom search term and define the number of tweets
    tweetCriteria = got.manager.TweetCriteria().setQuerySearch(
        'Coronavirus').setSince(start_string).setUntil(until_string).setLang('it').setMaxTweets(count)
    # Call getTweets and saving in tweets
    print('--- Starting query... ---')
    tweets = got.manager.TweetManager.getTweets(tweetCriteria)
    print('--- Adding to list... ---')
    add_to_list()
    print('--- Writing JSON... ---')
    # Saving list to JSON file
    json.dump(tweet_list, open('./JSON/saver_output.json', 'w'))
    print('--- Going to sleep... ---\n\n')
    time.sleep(60*5)
    # Add 1 to date after each passage
    date_start += datetime.timedelta(days=1)
    date_until += datetime.timedelta(days=1)
    # Convert dates to string
    start_string = date_start.strftime("%Y-%m-%d")
    until_string = date_until.strftime("%Y-%m-%d")

Doing like so i was able to retrieve almost 120k tweets in a night sleep without any hiccups, i know the code could be much shorter but i wrote it just before going to bed.

Hi, I think you have search for coronavirus and u have datam can u please send to me Thanks in advance ajithex@gmail.com

Hi,

I think I find a solution to get more than 14000 tweets per day with a small change in the package themself. You only have to install a sleeping time after 14000 tweets. In combination with a loop over the dates and rotation over proxy, this works for me very well.

Because I am going to write my masterthesis about coronavirus with Twitter data, I am interested to know what your plan is. So maybe contact me paul.drecker@stud.uni-due.de

Can u please share how you uses proxies and which proxy provider.

p-dre commented 4 years ago

@erno98 If you inside the package you will find a loop over the batches. I at a sleep time after 14000 tweets

roy601912008 commented 4 years ago

I found a solution, not ideal but it works, maybe you can help me make it better:

# Date to start from
date_upper = datetime.datetime(2020, 3, 1)
date_lower = datetime.datetime(2020, 2, 29)

date_until = date_upper
date_start = date_lower

start_string = date_start.strftime("%Y-%m-%d")
until_string = date_until.strftime("%Y-%m-%d")

for i in range(29):
    # Create a custom search term and define the number of tweets
    tweetCriteria = got.manager.TweetCriteria().setQuerySearch(
        'Coronavirus').setSince(start_string).setUntil(until_string).setLang('it').setMaxTweets(count)
    # Call getTweets and saving in tweets
    print('--- Starting query... ---')
    tweets = got.manager.TweetManager.getTweets(tweetCriteria)
    print('--- Adding to list... ---')
    add_to_list()
    print('--- Writing JSON... ---')
    # Saving list to JSON file
    json.dump(tweet_list, open('./JSON/saver_output.json', 'w'))
    print('--- Going to sleep... ---\n\n')
    time.sleep(60*5)
    # Add 1 to date after each passage
    date_start += datetime.timedelta(days=1)
    date_until += datetime.timedelta(days=1)
    # Convert dates to string
    start_string = date_start.strftime("%Y-%m-%d")
    until_string = date_until.strftime("%Y-%m-%d")

Doing like so i was able to retrieve almost 120k tweets in a night sleep without any hiccups, i know the code could be much shorter but i wrote it just before going to bed.

Hi, I think you have search for coronavirus and u have datam can u please send to me Thanks in advance ajithex@gmail.com

Hi,

I think I find a solution to get more than 14000 tweets per day with a small change in the package themself. You only have to install a sleeping time after 14000 tweets. In combination with a loop over the dates and rotation over proxy, this works for me very well.

Hi, could you share your code with me since I really want to know how to set up sleep time after 14000 tweets. I have just started programming, many thanks!

luyang1210 commented 4 years ago

I found a solution, not ideal but it works, maybe you can help me make it better:

# Date to start from
date_upper = datetime.datetime(2020, 3, 1)
date_lower = datetime.datetime(2020, 2, 29)

date_until = date_upper
date_start = date_lower

start_string = date_start.strftime("%Y-%m-%d")
until_string = date_until.strftime("%Y-%m-%d")

for i in range(29):
    # Create a custom search term and define the number of tweets
    tweetCriteria = got.manager.TweetCriteria().setQuerySearch(
        'Coronavirus').setSince(start_string).setUntil(until_string).setLang('it').setMaxTweets(count)
    # Call getTweets and saving in tweets
    print('--- Starting query... ---')
    tweets = got.manager.TweetManager.getTweets(tweetCriteria)
    print('--- Adding to list... ---')
    add_to_list()
    print('--- Writing JSON... ---')
    # Saving list to JSON file
    json.dump(tweet_list, open('./JSON/saver_output.json', 'w'))
    print('--- Going to sleep... ---\n\n')
    time.sleep(60*5)
    # Add 1 to date after each passage
    date_start += datetime.timedelta(days=1)
    date_until += datetime.timedelta(days=1)
    # Convert dates to string
    start_string = date_start.strftime("%Y-%m-%d")
    until_string = date_until.strftime("%Y-%m-%d")

Doing like so i was able to retrieve almost 120k tweets in a night sleep without any hiccups, i know the code could be much shorter but i wrote it just before going to bed.

Hi, Could you share the program with me via luyang1210@gmail.com? I have come across the same issue and really want to solve it. Many thanks!