cjbarrie / academictwitteR

Repo for academictwitteR package to query the Twitter Academic Research Product Track v2 API endpoint.
Other
271 stars 58 forks source link

Example #15

Closed luisignaciomenendez closed 3 years ago

luisignaciomenendez commented 3 years ago

Hello

Is it possible that you upload a more detailed example of the function performance?

I have been trying to scrape tweets for the #BLM but I am getting weird results. I don't know how does the function scrape over the tweets but in some samples I was getting tweets coming from a few number of users with a lot of retweets. Does the function take random tweets in that interval? Is there a way to control for the country we are interested in?

Also, for te json files I can't really find a way to convert them to dataframes properly .

Thanks a lot in advance!

justinchuntingho commented 3 years ago

The function will call the /2/tweets/search/all end point, it will return all tweets matching the search query. There is a way to search tweets only from a certain country with the place_country: operator and also filter out retweets using -is:retweet. A working example would be as follow:

get_hashtag_tweets("#BLM place_country:US -is:retweet", "2019-08-01T00:00:00Z", "2019-08-10T01:00:00Z", bearer_token)

The json files should be parsed automatically, but if you would like to parse it again at a later stage, the read_json() function from jsonlite could do the trick.

luisignaciomenendez commented 3 years ago

Thanks a lot for the answer!! When you mean that it calls that end point, does it mean that if I want to get N tweets rather than all of them I need to modify the original function or is there a way to limit it from the query?

justinchuntingho commented 3 years ago

It is actually on our To Do List to have an option to limit the amount of tweets. But for the current version, you could try narrowing the data range if you wish to have fewer tweets. However, it is not possible to limit percisely how many tweets you are getting (although it is always possible to remove extra tweets afterwards).

luisignaciomenendez commented 3 years ago

Hello

I wondered if there is the possibility of obtaining geo-location information for the function.

I have tried this myself by including has:geo in the query. This returns a column named place_id. I have been trying to search for information on how to convert this column to actual coordinates/names in order to operate with it. For what I have found, combining expansions=geo.place_id together with tweet.fields=geo should return an object that includes the necessary information. I have tried to modify the function in order to do this (adding extra arguments) but it does not seem to do what I want.

Could you give me a hand?

Thank you

tweedmann commented 3 years ago

Hi,

thank yo uso much for your work and the example in this thread. I have also a question related to this: Is there a way to use the get_hashtag_tweets function and filter tweets by language? I already tried adding lang:de for German tweets but no matter what I try it doesn't seem to work. Is there anything I can do?

Thank you in advance.

luisignaciomenendez commented 3 years ago

Hi,

thank yo uso much for your work and the example in this thread. I have also a question related to this: Is there a way to use the get_hashtag_tweets function and filter tweets by language? I already tried adding lang:de for German tweets but no matter what I try it doesn't seem to work. Is there anything I can do?

Thank you in advance.

Hi

I am getting tweets filtered by language so maybe my comment comes handy. You need to insert ( for english) lang:en in your query in order to ask for a specific language. Also make sure to use the boolean operators in parenthesis ( I had some issues with this at the beginning).

This is the code I am actually using and it does return tweets only in english:

blm<-get_hashtag_tweets("( #BlackLivesMatter OR #blacklivesmatter) has:geo lang:en -is:retweet place_country:US", "2020-05-25T00:00:00Z", "2020-06-25T01:00:00Z", bearer_token)

Hope it helps

tweedmann commented 3 years ago

Hey, thank you so much for your help Luis. I tried your code and it works. I think putting parenthesis around the boolean operators is important. Thanks!