Mottl / GetOldTweets3

A Python 3 library and a corresponding command line utility for accessing old tweets
MIT License
367 stars 126 forks source link

HTTP Error, Gives 404 but the URL is working #98

Open sagefuentes opened 4 years ago

sagefuentes commented 4 years ago

Hi, I had a script running over the past weeks and earlier today it stopped working. I keep receiving HTTPError 404, but the provided link in the errors still brings me to a valid page. Code is (all mentioned variables are established and the error specifically happens with the Manager when I check via debugging): tweetCriteria = got.manager.TweetCriteria().setQuerySearch(term)\ .setMaxTweets(max_count)\ .setSince(begin_timeframe)\ .setUntil(end_timeframe) scraped_tweets = got.manager.TweetManager.getTweets(tweetCriteria)

The error message for this is the standard 404 error "An error occured during an HTTP request: HTTP Error 404: Not Found Try to open in browser:" followed by the valid link

As I have changed nothing about the folder, I am wondering if something has happened with my configurations more so than anything else, but wondering if others are experiencing this.

lenhhoxung86 commented 4 years ago

Any alternative solution for it? My masters thesis is on hold because of it. I tried snscrape as mentioned in above comment but it does not return result based on a search query string

I used the below query search and it returns me the links of the tweets.

snscrape twitter-search "#XRP since:2019-12-31 until:2020-09-25" > XRP_Sept_tweets.txt

I obtain the tweet_id and then I used tweepy to extract the tweet as I needed more attributes (may not be the best way to do):

def get_tweets(tweet_ids, currency):
    #     global api
    statuses = api.statuses_lookup(tweet_ids, tweet_mode="extended")
    data = get_df() # define your own dataframe
    # printing the statuses
    for status in statuses:
        # print(status.lang)

        if status.lang == "en":
            mined = {
                "tweet_id": status.id,
                "name": status.user.name,
                "screen_name": status.user.screen_name,
                "retweet_count": status.retweet_count,
                "text": status.full_text,
                "mined_at": datetime.datetime.now(),
                "created_at": status.created_at,
                "favourite_count": status.favorite_count,
                "hashtags": status.entities["hashtags"],
                "status_count": status.user.statuses_count,
                "followers_count": status.user.followers_count,
                "location": status.place,
                "source_device": status.source,
                "coin_symbol": currency
            }

            last_tweet_id = status.id
            data = data.append(mined, ignore_index=True)

    print(currency, "outputing to tweets", len(data))
    data.to_csv(
        f"Extracted_TWEETS.csv", mode="a", header=not os.path.exists("Extracted_TWEETS.csv"), index=False
    )
    print("..... going to sleep 20s")
    time.sleep(20)

Note that tweet_ids is a list of 100 tweet ids.

This really works. Many thanks. Just keep in mind that using snscrape may return too many results, thus it is better to limit the number of tweet IDs using --max-results

baraths92 commented 4 years ago

snscrape twitter-search "#XRP since:2019-12-31 until:2020-09-25" > XRP_Sept_tweets.txt

Hello..... I am facing issues with snscrape. I do not have command line environments and I am not able to run the program. Can you please explain step by step on how to run with jupyter notebook? And, getting the tweet ids are enough because I have tweepy to extract the tweets from tweet id.

I am also getting the error module 'functools' has no attribute 'cached_property'

HuifangYeo commented 4 years ago

snscrape twitter-search "#XRP since:2019-12-31 until:2020-09-25" > XRP_Sept_tweets.txt

Hello..... I am facing issues with snscrape. I do not have command line environments and I am not able to run the program. Can you please explain step by step on how to run with jupyter notebook? And, getting the tweet ids are enough because I have tweepy to extract the tweets from tweet id.

I am also getting the error module 'functools' has no attribute 'cached_property'

I have (miniconda)[https://docs.conda.io/en/latest/miniconda.html] on Python 3.8. It doesn't work on Python of lower version it seems. Then just install snscrape as follows: pip3 install snscrape

from the miniconda terminal, you should be able to use snscrape directly:

image

baraths92 commented 4 years ago

snscrape twitter-search "#XRP since:2019-12-31 until:2020-09-25" > XRP_Sept_tweets.txt

Hello..... I am facing issues with snscrape. I do not have command line environments and I am not able to run the program. Can you please explain step by step on how to run with jupyter notebook? And, getting the tweet ids are enough because I have tweepy to extract the tweets from tweet id. I am also getting the error module 'functools' has no attribute 'cached_property'

I have (miniconda)[https://docs.conda.io/en/latest/miniconda.html] on Python 3.8. It doesn't work on Python of lower version it seems. Then just install snscrape as follows: pip3 install snscrape

from the miniconda terminal, you should be able to use snscrape directly:

image

Thank you very much! It worked!! Thank you once again and I feel grateful for your help! :-)

xmacex commented 4 years ago

Any alternative solution for it? My masters thesis is on hold because of it.

What an excellent opportunity to write a chapter about politics of APIs in the context of research! 😅 Your supervisor will have references for literature I am sure (and depending on your field), but you can look at publications from the Digital Methods Initiative at the University of Amsterdam, including people like Anne Helmond.

sachinator96 commented 4 years ago

Try Python Script to Download Tweets.

Hey! @rsafa Is it possible to get large number of tweets like 10,000- 100,000. Is there a way to scrape large numbers?

praneethnooli commented 4 years ago

Hello everyone, Is it possible to use snscrape or some other way to get the tweets for a specified twitter handle within the mentioned date range?

I basically want to find an alternate working way for this below GetoldTweets3 command

GetOldTweets3 --username "barackobama" --since 2015-09-10 --until 2015-09-12

ppival commented 4 years ago

Edited

With snscrape, this works:

snscrape --jsonl twitter-search "from:barackobama since:2015-09-10 until:2015-09-12”> baracktweets.json or snscrape twitter-search "from:barackobama since:2015-09-10 until:2015-09-12” > baracktweets.txt

Explanation from the developer: twitter-user is actually just a wrapper around twitter-search using the search term from:username (plus code to extract user information from the profile page)

Hello everyone, Is it possible to use snscrape or some other way to get the tweets for a specified twitter handle within the mentioned date range?

I basically want to find an alternate working way for this below GetoldTweets3 command

GetOldTweets3 --username "barackobama" --since 2015-09-10 --until 2015-09-12

shelu16 commented 4 years ago

You can use as Twitter search .Twitter user instagram etc check there

On Tue, 29 Sep 2020, 6:54 am Paul R. Pival, notifications@github.com wrote:

With snscrape, this should work, but appears to be timing out for me - your mileage may vary... I may be too impatient :-)

snscrape --jsonl twitter-user "barackobama since:2015-09-10 until:2015-09-12”> baracktweets.json or snscrape twitter-user "barackobama since:2015-09-10 until:2015-09-12” > baracktweets.txt

Hello everyone, Is it possible to use snscrape or some other way to get the tweets for a specified twitter handle within the mentioned date range?

I basically want to find an alternate working way for this below GetoldTweets3 command

GetOldTweets3 --username "barackobama" --since 2015-09-10 --until 2015-09-12

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Mottl/GetOldTweets3/issues/98#issuecomment-700369087, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHJNX46BVLTXOFYMD7CS7NLSIEZOTANCNFSM4RRHTOYA .

shelu16 commented 4 years ago

You can check here more information https://github.com/JustAnotherArchivist/snscrape

irwanOyong commented 4 years ago

Hi @ppival @shelu16 , thanks for the snscrape reference. I tried it and the twitter-search module works, but it only gives me the list of tweet url, e.g: https://twitter.com/irwanOyong/status/1309516653386842113

Tried the --jsonl and --with-entity but it failed. Any insight to get the item (tweet) details?

ppival commented 4 years ago

Well I continue to have spotty success with snscrape, but I can confirm the following query worked:

snscrape --jsonl twitter-search 'musim-musim since:2020-01-01 until:2020-07-01' > musim-musum.json

That will output json for each tweet such as:

{"url": "https://twitter.com/bibIichor/status/1278110922947493888", "date": "2020-06-30T23:39:24+00:00", "content": "@atermoiends menyebut musim-musim begitu selama nungguin seseorang. \ud83d\ude14\ud83d\udc95", "id": 1278110922947493888, "username": "bibIichor", "outlinks": [], "outlinksss": "", "tcooutlinks": [], "tcooutlinksss": "", "retweetedTweet": null}

As noted in the snscrape installation note, you will require python 3.8 and the development version for --jsonl to work...

Hi @ppival @shelu16 , thanks for the snscrape reference. I tried it and the twitter-search module works, but it only gives me the list of tweet url, e.g: https://twitter.com/irwanOyong/status/1309516653386842113

Tried the --jsonl and --with-entity but it failed. Any insight to get the item (tweet) details?

bensilver95 commented 4 years ago

Hi @ppival @shelu16 , thanks for the snscrape reference. I tried it and the twitter-search module works, but it only gives me the list of tweet url, e.g: https://twitter.com/irwanOyong/status/1309516653386842113

Tried the --jsonl and --with-entity but it failed. Any insight to get the item (tweet) details?

@irwanOyong I was having the same issue, the reason is I wasn't using the development version of snscrape. Be sure to install it with pip3 install git+https://github.com/JustAnotherArchivist/snscrape.git

Once I did that it worked like @ppival said it should.

am11ne commented 4 years ago

how can i use Getoldtweets3 again ?

mwaters166 commented 4 years ago

Kind of a weird workaround for Tweepy... but I used snscrape to start obtaining tweets from 'big_ben_clock', which is a bot that tweets every hour (and is relatively consistent). I used the bot's tweets to be able to obtain tweet ids that correspond to specific dates/times. Then I used those tweet/time ids to be able to collect tweets from other users at specific times. I outlined the process I used in a Jupyter Notebook: https://github.com/mwaters166/Twitter_OM_Insight_Project/blob/master/1_Scrape_Tweets_Tweepy_Time_Ids.ipynb, and the time ids for 2020 can be found here (although most of January is missing): https://github.com/mwaters166/Twitter_OM_Insight_Project/blob/master/time_ids.csv. I also tried to automate the process, and there's a run.sh file in the main directory. Let me know if anyone finds a better solution (there's gotta be a better way lol)!

destractor-edo commented 4 years ago

First of all thank you very much for your help, i would like to know if it is possible to extract only a part of --jsonl such as "content". Maybe even the author of the post.

Well I continue to have spotty success with snscrape, but I can confirm the following query worked:

snscrape --jsonl twitter-search 'musim-musim since:2020-01-01 until:2020-07-01' > musim-musum.json

That will output json for each tweet such as:

{"url": "https://twitter.com/bibIichor/status/1278110922947493888", "date": "2020-06-30T23:39:24+00:00", "content": "@atermoiends menyebut musim-musim begitu selama nungguin seseorang. \ud83d\ude14\ud83d\udc95", "id": 1278110922947493888, "username": "bibIichor", "outlinks": [], "outlinksss": "", "tcooutlinks": [], "tcooutlinksss": "", "retweetedTweet": null}

As noted in the snscrape installation note, you will require python 3.8 and the development version for --jsonl to work...

Hi @ppival @shelu16 , thanks for the snscrape reference. I tried it and the twitter-search module works, but it only gives me the list of tweet url, e.g: https://twitter.com/irwanOyong/status/1309516653386842113 Tried the --jsonl and --with-entity but it failed. Any insight to get the item (tweet) details?

am11ne commented 4 years ago

hi, it is possible to use GetOldTweets3 again ?

Sumbalq commented 4 years ago

nscrape twitter-search "from:barackobama since:2015-09-10 until:2015-09-12” > baracktweets.txt

Can you please tell me how to get tweets with multiple keywords in search query like "Jobs AND (unemployment OR government)" @ppival

ashutosh925 commented 4 years ago

Same is happening to me.. Did someone found solution?

sagefuentes commented 3 years ago

People here that have been using snscrape, can you post any code examples just doing a simple query search in script and not console? The lack of documentation is making this more trial and error as I learn the modules.

prai0072010 commented 3 years ago

People here that have been using snscrape, can you post any code examples just doing a simple query search in script and not console? The lack of documentation is making this more trial and error as I learn the modules.

import snscrape.modules.twitter as sntwitter
for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'since:2015-12-17 until:2020-09-25').get_items()) :
        if i > maxTweets :
            break
        print(tweet.username)
        print(tweet.renderedContent)
destractor-edo commented 3 years ago

snscrape.modules.twitter as sntwitter

Honestly, running it in miniconda does not work this code (if you have advice on other software they are welcome). Sorry but I've only been using Python for a short time. Having said that my problem is that if I want to download the tweets for a long period of time I reach the maximum number that can be downloaded with Snscrape, I would like to overcome this problem by putting for example a time lag after a few tweets or something similar.

People here that have been using snscrape, can you post any code examples just doing a simple query search in script and not console? The lack of documentation is making this more trial and error as I learn the modules.

import snscrape.modules.twitter as sntwitter
for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'since:2015-12-17 until:2020-09-25').get_items()) :
        if i > maxTweets :
            break
        print(tweet.username)
        print(tweet.renderedContent)
justinchuntingho commented 3 years ago

First of all thank you very much for your help, i would like to know if it is possible to extract only a part of --jsonl such as "content". Maybe even the author of the post.

Well I continue to have spotty success with snscrape, but I can confirm the following query worked: snscrape --jsonl twitter-search 'musim-musim since:2020-01-01 until:2020-07-01' > musim-musum.json That will output json for each tweet such as: {"url": "https://twitter.com/bibIichor/status/1278110922947493888", "date": "2020-06-30T23:39:24+00:00", "content": "@atermoiends menyebut musim-musim begitu selama nungguin seseorang. \ud83d\ude14\ud83d\udc95", "id": 1278110922947493888, "username": "bibIichor", "outlinks": [], "outlinksss": "", "tcooutlinks": [], "tcooutlinksss": "", "retweetedTweet": null} As noted in the snscrape installation note, you will require python 3.8 and the development version for --jsonl to work...

Hi @ppival @shelu16 , thanks for the snscrape reference. I tried it and the twitter-search module works, but it only gives me the list of tweet url, e.g: https://twitter.com/irwanOyong/status/1309516653386842113 Tried the --jsonl and --with-entity but it failed. Any insight to get the item (tweet) details?

Yes, it is possible. The .json is a JSON lines file and you might read it with json.loads() from the json package. A sample code can be found here: https://github.com/JustAnotherArchivist/snscrape/issues/82#issue-708558238

muydipalma commented 3 years ago

snscrape twitter-search "#XRP since:2019-12-31 until:2020-09-25" > XRP_Sept_tweets.txt

This, doesnt do anything, im in a clean env with python=3.8.5

kiranbhatia16 commented 3 years ago

I have to rework on the following line of the code to get username specific time bound tweets. Can someone help? As of now I am getting the HTTP Error 404: Not Found. USERNAME = "narendramodi" START_DATE = "2019-11-09" END_DATE = "2019-11-14" tweetCriteria = GetOldTweets3.manager.TweetCriteria().setUsername(USERNAME).setSince(START_DATE).setUntil(END_DATE).setMaxTweets(100)

tuhinanshusingh commented 3 years ago

Yes, facing the same issue. Even updated the library. Still not working!

NomuraTakamichi commented 3 years ago

Me too

C4PT41ND34DP00L commented 3 years ago

Same issue here.

adamzvx commented 3 years ago

Not work for me too

ahsanspark commented 3 years ago

For those who are still struggling to download tweets as csv from snscrape, for me this works absolutely fine. Configurations: Windows 7 SP1 (64 bit) Python 3.8.6 pip3.8 install git+https://github.com/JustAnotherArchivist/snscrape.git Write this code in new Jupyter Notebook and make sure that, it is using Python 3.8.6 Kernel

Using code from the above comments.

import snscrape.modules.twitter as sntwitter
import csv

keyword = 'Covid'
maxTweets = 30000

#Open/create a file to append data to
csvFile = open('result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet']) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'since:2020-06-01 until:2020-06-30 -filter:links -filter:replies').get_items()) :
        if i > maxTweets :
            break      
        csvWriter.writerow([tweet.id, tweet.date, tweet.renderedContent])
csvFile.close()
pablotoledo commented 3 years ago

For those who are still struggling to download tweets as csv from snscrape, for me this works absolutely fine. Configurations: Windows 7 SP1 (64 bit) Python 3.8.6 pip3.8 install git+https://github.com/JustAnotherArchivist/snscrape.git Write this code in new Jupyter Notebook and make sure that, it is using Python 3.8.6 Kernel

Using code from the above comments.

import snscrape.modules.twitter as sntwitter
import csv

keyword = 'Covid'
maxTweets = 30000

#Open/create a file to append data to
csvFile = open('result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet']) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'since:2020-06-01 until:2020-06-30 -filter:links -filter:replies').get_items()) :
        if i > maxTweets :
            break      
        csvWriter.writerow([tweet.id, tweet.date, tweet.renderedContent])
csvFile.close()

this is when you are trying to filter by providing two dates, but how do you get all tweets? just by removing the filter criteria?

ahsanspark commented 3 years ago

For those who are still struggling to download tweets as csv from snscrape, for me this works absolutely fine. Configurations: Windows 7 SP1 (64 bit) Python 3.8.6 pip3.8 install git+https://github.com/JustAnotherArchivist/snscrape.git Write this code in new Jupyter Notebook and make sure that, it is using Python 3.8.6 Kernel Using code from the above comments.

import snscrape.modules.twitter as sntwitter
import csv

keyword = 'Covid'
maxTweets = 30000

#Open/create a file to append data to
csvFile = open('result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet']) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'since:2020-06-01 until:2020-06-30 -filter:links -filter:replies').get_items()) :
        if i > maxTweets :
            break      
        csvWriter.writerow([tweet.id, tweet.date, tweet.renderedContent])
csvFile.close()

this is when you are trying to filter by providing two dates, but how do you get all tweets? just by removing the filter criteria?

Yes, you can add or remove filters as per your need.

SophieChowZZY commented 3 years ago

For those who are still struggling to download tweets as csv from snscrape, for me this works absolutely fine. Configurations: Windows 7 SP1 (64 bit) Python 3.8.6 pip3.8 install git+https://github.com/JustAnotherArchivist/snscrape.git Write this code in new Jupyter Notebook and make sure that, it is using Python 3.8.6 Kernel

Using code from the above comments.

import snscrape.modules.twitter as sntwitter
import csv

keyword = 'Covid'
maxTweets = 30000

#Open/create a file to append data to
csvFile = open('result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet']) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'since:2020-06-01 until:2020-06-30 -filter:links -filter:replies').get_items()) :
        if i > maxTweets :
            break      
        csvWriter.writerow([tweet.id, tweet.date, tweet.renderedContent])
csvFile.close()

May I ask what if I want to filter the language of the tweet (e.g. only tweet in English)? How can I add the filter for that?

ahsanspark commented 3 years ago

For those who are still struggling to download tweets as csv from snscrape, for me this works absolutely fine. Configurations: Windows 7 SP1 (64 bit) Python 3.8.6 pip3.8 install git+https://github.com/JustAnotherArchivist/snscrape.git Write this code in new Jupyter Notebook and make sure that, it is using Python 3.8.6 Kernel Using code from the above comments.

import snscrape.modules.twitter as sntwitter
import csv

keyword = 'Covid'
maxTweets = 30000

#Open/create a file to append data to
csvFile = open('result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet']) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'since:2020-06-01 until:2020-06-30 -filter:links -filter:replies').get_items()) :
        if i > maxTweets :
            break      
        csvWriter.writerow([tweet.id, tweet.date, tweet.renderedContent])
csvFile.close()

May I ask what if I want to filter the language of the tweet (e.g. only tweet in English)? How can I add the filter for that?

add "lang:en" without quotes inside query string example: for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'lang:en').get_items()) :

HadiKotaich commented 3 years ago

Hello, I feel that the time is not being read in the query (only the date is). I tried this earlier today (tried different time intervals in the same day), it is returning 0 results any idea how to solve it?

for i,tweet in enumerate(sntwitter.TwitterSearchScraper("lebanon since:2020-01-01 00:00:00 until:2020-01-01 06:00:00").get_items())

davidbernat commented 3 years ago

For those using snscrape please see this issue about installing from pip.

To use the --jsonl flag you must do: pip3 install --upgrade git+https://github.com/JustAnotherArchivist/snscrape@master

Ref: https://github.com/JustAnotherArchivist/snscrape/issues/77

WelXingz commented 3 years ago

For those who are still struggling to download tweets as csv from snscrape, for me this works absolutely fine. Configurations: Windows 7 SP1 (64 bit) Python 3.8.6 pip3.8 install git+https://github.com/JustAnotherArchivist/snscrape.git Write this code in new Jupyter Notebook and make sure that, it is using Python 3.8.6 Kernel

Using code from the above comments.

import snscrape.modules.twitter as sntwitter
import csv

keyword = 'Covid'
maxTweets = 30000

#Open/create a file to append data to
csvFile = open('result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet']) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'since:2020-06-01 until:2020-06-30 -filter:links -filter:replies').get_items()) :
        if i > maxTweets :
            break      
        csvWriter.writerow([tweet.id, tweet.date, tweet.renderedContent])
csvFile.close()

I've tried to run this code with python 3.8.6 on windows 10 and it didn't give me any result, it makes no erros but i end up with a empty csv (only with the headers), is there something that i might be missing?

AugusteDebroise commented 3 years ago

For those who are still struggling to download tweets as csv from snscrape, for me this works absolutely fine. Configurations: Windows 7 SP1 (64 bit) Python 3.8.6 pip3.8 install git+https://github.com/JustAnotherArchivist/snscrape.git Write this code in new Jupyter Notebook and make sure that, it is using Python 3.8.6 Kernel Using code from the above comments.

import snscrape.modules.twitter as sntwitter
import csv

keyword = 'Covid'
maxTweets = 30000

#Open/create a file to append data to
csvFile = open('result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet']) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'since:2020-06-01 until:2020-06-30 -filter:links -filter:replies').get_items()) :
        if i > maxTweets :
            break      
        csvWriter.writerow([tweet.id, tweet.date, tweet.renderedContent])
csvFile.close()

I've tried to run this code with python 3.8.6 on windows 10 and it didn't give me any result, it makes no erros but i end up with a empty csv (only with the headers), is there something that i might be missing?

Not sure why, but I had the same problem. I replace tweet.renderedContent by tweet.content and it works !

WelXingz commented 3 years ago

For those who are still struggling to download tweets as csv from snscrape, for me this works absolutely fine. Configurations: Windows 7 SP1 (64 bit) Python 3.8.6 pip3.8 install git+https://github.com/JustAnotherArchivist/snscrape.git Write this code in new Jupyter Notebook and make sure that, it is using Python 3.8.6 Kernel Using code from the above comments.

import snscrape.modules.twitter as sntwitter
import csv

keyword = 'Covid'
maxTweets = 30000

#Open/create a file to append data to
csvFile = open('result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet']) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'since:2020-06-01 until:2020-06-30 -filter:links -filter:replies').get_items()) :
        if i > maxTweets :
            break      
        csvWriter.writerow([tweet.id, tweet.date, tweet.renderedContent])
csvFile.close()

I've tried to run this code with python 3.8.6 on windows 10 and it didn't give me any result, it makes no erros but i end up with a empty csv (only with the headers), is there something that i might be missing?

Not sure why, but I had the same problem. I replace tweet.renderedContent by tweet.content and it works !

unfortunately that wasn't my case, but i found the problem and it was about the date filter, i got all the results by removing them but now i can't filter a specific time which is bad.

TamiresMonteiroCD commented 3 years ago

Edit: Esqueci de dizer isso. Às vezes, o aplicativo me dá um 400: Bad Request, eu o executo novamente e ele produz o HTML como disse antes.

This flashing seems to be related to the random choice of user agent in TweetManager.py where "user_agent = random.choice (TweetManager.user_agents ...". I believe that a loop scanning the user agent list with exception handling solves this problem.

burakoglakci commented 3 years ago

@TamiresMonteiroCD @WelXingz @ahsanspark @Atoxal @SophieChowZZY

I think I solved the problem. I made a few changes to the lines. I collect tweets using a word and location filter. I'm using Python 3.8.6 on Windows 10 and it works fine right now.

import snscrape.modules.twitter as sntwitter
import csv
maxTweets = 3000

#keyword = 'deprem'
#place = '5e02a0f0d91c76d2' #This geo_place string corresponds to İstanbul, Turkey on twitter.

#keyword = 'covid'
#place = '01fbe706f872cb32' This geo_place string corresponds to Washington DC on twitter.

#Open/create a file to append data to
csvFile = open('place_result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet',]) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('deprem + place:5e02a0f0d91c76d2 + since:2020-10-31 until:2020-11-03 -filter:links -filter:replies').get_items()):
        if i > maxTweets :
            break  
        csvWriter.writerow([tweet.id, tweet.date, tweet.content])
csvFile.close()
bensilver95 commented 3 years ago

For those who are still struggling to download tweets as csv from snscrape, for me this works absolutely fine. Configurations: Windows 7 SP1 (64 bit) Python 3.8.6 pip3.8 install git+https://github.com/JustAnotherArchivist/snscrape.git Write this code in new Jupyter Notebook and make sure that, it is using Python 3.8.6 Kernel Using code from the above comments.

import snscrape.modules.twitter as sntwitter
import csv

keyword = 'Covid'
maxTweets = 30000

#Open/create a file to append data to
csvFile = open('result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet']) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'since:2020-06-01 until:2020-06-30 -filter:links -filter:replies').get_items()) :
        if i > maxTweets :
            break      
        csvWriter.writerow([tweet.id, tweet.date, tweet.renderedContent])
csvFile.close()

I've tried to run this code with python 3.8.6 on windows 10 and it didn't give me any result, it makes no erros but i end up with a empty csv (only with the headers), is there something that i might be missing?

Not sure why, but I had the same problem. I replace tweet.renderedContent by tweet.content and it works !

unfortunately that wasn't my case, but i found the problem and it was about the date filter, i got all the results by removing them but now i can't filter a specific time which is bad.

I'm having the exact same problem. When I remove the date filter it works, but when I have it (exactly how it is in the quoted code), I get no results. Anyone else having this issue or know how to solve it? @burakoglakci it's not clear to me how the changes you made in the code would solve this problem.

**Edit: I think I figured it out. It's simply that there was a small error in the quoted code, you have to put a space before the 'since'

Niehaus commented 3 years ago

For those who are still struggling to download tweets as csv from snscrape, for me this works absolutely fine. Configurations: Windows 7 SP1 (64 bit) Python 3.8.6 pip3.8 install git+https://github.com/JustAnotherArchivist/snscrape.git Write this code in new Jupyter Notebook and make sure that, it is using Python 3.8.6 Kernel Using code from the above comments.

import snscrape.modules.twitter as sntwitter
import csv

keyword = 'Covid'
maxTweets = 30000

#Open/create a file to append data to
csvFile = open('result.csv', 'a', newline='', encoding='utf8')

#Use csv writer
csvWriter = csv.writer(csvFile)
csvWriter.writerow(['id','date','tweet']) 

for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + 'since:2020-06-01 until:2020-06-30 -filter:links -filter:replies').get_items()) :
        if i > maxTweets :
            break      
        csvWriter.writerow([tweet.id, tweet.date, tweet.renderedContent])
csvFile.close()

I've tried to run this code with python 3.8.6 on windows 10 and it didn't give me any result, it makes no erros but i end up with a empty csv (only with the headers), is there something that i might be missing?

Not sure why, but I had the same problem. I replace tweet.renderedContent by tweet.content and it works !

unfortunately that wasn't my case, but i found the problem and it was about the date filter, i got all the results by removing them but now i can't filter a specific time which is bad.

I'm having the exact same problem. When I remove the date filter it works, but when I have it (exactly how it is in the quoted code), I get no results. Anyone else having this issue or know how to solve it? @burakoglakci it's not clear to me how the changes you made in the code would solve this problem.

**Edit: I think I figured it out. It's simply that there was a small error in the quoted code, you have to put a space before the 'since'

yeah, it should be keyword + ' since:2020-06-01 until:2020-06-30 -filter:links -filter:replies' really simple, nice catch! :D

burakoglakci commented 3 years ago

@bensilver95 @Niehaus

Absolutely, our queries are working. The codes I added in the previous post were not displayed correctly. If you want to add a location filter to your query,

keyword = 'covid'

keyword + ' place:095534ad3107e0e6 + since:2020-10-20 until:2020-11-04 -filter:links -filter:replies').get_items()):

you can run this query, with this query, you can collect shared tweets about covid from the state of Kentucky. Querying on shorter date ranges, as with GOT, can yield better results. Because in queries where there are too many tweets, twitter can stop responding.

Niehaus commented 3 years ago

@burakoglakci You can please help me with the querie to get tweets of a specific user?

burakoglakci commented 3 years ago

@Niehaus A query like this works, I hope it works.

import snscrape.modules.twitter as sntwitter import csv maxTweets = 3000

csvFile = open('place_result.csv', 'a', newline='', encoding='utf8')

csvWriter = csv.writer(csvFile) csvWriter.writerow(['id','date','tweet',])

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:@burakoglakci + since:2015-12-02 until:2020-11-05-filter:links -filter:replies').get_items()): if i > maxTweets : break
csvWriter.writerow([tweet.id, tweet.date, tweet.content]) csvFile.close()

Fatima-Haouari commented 3 years ago

Thanks all for the useful comments and help to solve the scraping issue. Does anyone tried scraping replies of tweets? I really appreciate your help.

sbif commented 3 years ago

Hi guys! I'm totally lost: how can I use snscrape to extract tweet from a user in a specific time lapse? I'm a beginner with Python, I have to do this for my thesis: It's three weeks I'm trying to extract this data without success, I tried with tweepy and than with GetOldTweets3 and I've just discovered about this new TwitterApi limit... Can somebody help me please?

burakoglakci commented 3 years ago

@sbif

Hi guys! I'm totally lost: how can I use snscrape to extract tweet from a user in a specific time lapse? I'm a beginner with Python, I have to do this for my thesis: It's three weeks I'm trying to extract this data without success, I tried with tweepy and than with GetOldTweets3 and I've just discovered about this new TwitterApi limit... Can somebody help me please?

Use this query with snscrape:

import snscrape.modules.twitter as sntwitter import csv maxTweets = 3000

csvFile = open('place_result.csv', 'a', newline='', encoding='utf8')

csvWriter = csv.writer(csvFile) csvWriter.writerow(['id','date','tweet',])

for i,tweet in enumerate(sntwitter.TwitterSearchScraper('from:@BillGates + since:2015-12-02 until:2020-11-05-filter:links -filter:replies').get_items()): if i > maxTweets : break csvWriter.writerow([tweet.id, tweet.date, tweet.content]) csvFile.close()

ldallacq commented 3 years ago

Hello! I am using the last snscrape query, but it is not working for me. I am using @joebiden from 2020-01-01 and I am getting a weird output with just 1 tweet. I am a mac user, if any. I really do not know what is going on. I literally copy-paste the code and change the handle but it does not work. Any hints? Thank you so much!