Closed shmuhammadd closed 3 years ago
I am getting the same error. In my case, it was working perfectly till yesterday. I'm getting this error from this evening. Might be a problem at Twitter's end.
Same issue here. Python works with no problem though and same bearer.
I am getting the same error. In my case, it was working perfectly till yesterday. I'm getting this error from this evening. Might be a problem at Twitter's end.
Works fine for me also yesterday.
Twitter just changed its API a few days ago, if a user requests context_annotations
with the tweet.fields
parameter (by default on), the fetch will be limited to 100 tweets per page (by default 500, hence the error). A quick workaround would be to add page_n = 100
. We are working on a fix and will update soon.
Twitter just changed its API a few days ago, if a user requests
context_annotations
with thetweet.fields
parameter (by default on), the fetch will be limited to 100 tweets per page (by default 500). A quick workaround would be to addpage_n = 100
. We are working on a fix and will update soon.
Thanks for the response @justinchuntingho.
I'm having an issue on this front. I ran this code about 4-5 days ago with no issue--I was getting tweets scraped of upwards of 250,000 which was fantastic. Now I am getting this 400 error message and using page_n =100 obviously limits me to 100 tweets per page and maxing out my tweets at 100. Is there a workaround for this or is this package now limiting to that few of tweets?
Starting from #181, you should now be able to specify context_annotations = FALSE
(also the default), in this case you will be able to fetch 500 tweets per page. We will try to push the patch to CRAN soon but at the mean time you could install the development version to use this.
This is the message I get with the following code:
tweets4 <- get_all_tweets(query=build_query("sanctuary cities OR sanctuary city",is_retweet=FALSE,lang="en"),start_tweets="2018-01-01T00:00:00Z",end_tweets="2018-01-03T00:00:00Z", bearer_token=bearer_token, data_path = "data6/", bind_tweets = TRUE, context_annotations=FALSE, page_n=500) query: sanctuary cities OR sanctuary city -is:retweet lang:en Total pages queried: 1 (tweets captured this page: 496). Total tweets captured now reach 100 : finishing collection.
@jmwright432 How about
city",is_retweet=FALSE,lang="en"),start_tweets="2018-01-01T00:00:00Z",end_tweets="2018-01-03T00:00:00Z", bearer_token=bearer_token, data_path = "data6/", bind_tweets = TRUE, context_annotations=FALSE, page_n=500, n = Inf)
You needa tune the n
.
Is there a workaround for this or is this package now limiting to that few of tweets?
From what I understand about the new twitter update and this package, you should still be able to mine > 100 tweets , it will just be much slower if you want context_annotations until an update @justinchuntingho ?
I am not @justinchuntingho (I'm the quiet Beatle). But I can answer your question @natesheehan.
The update is there, now. You can install the Github version.
First thing first, you can get more than 100 tweets. You can get 1000 tweets in 5s, for example. The only change is that you won't get the context annotations
, the things (e.g. topics, name entities) that Twitter extracted for you from tweets.
require(academictwitteR)
#> Loading required package: academictwitteR
start_time <- Sys.time()
x <- get_all_tweets(
query = "#ichbinhanna",
start_tweets = "2021-01-01T00:00:00Z",
end_tweets = "2021-07-01T00:00:00Z",
n = 1000
)
#> Warning: Recommended to specify a data path in order to mitigate data loss when
#> ingesting large amounts of data.
#> Warning: Tweets will not be stored as JSONs or as a .rds file and will only be
#> available in local memory if assigned to an object.
#> query: #ichbinhanna
#> Total pages queried: 1 (tweets captured this page: 500).
#> Total pages queried: 2 (tweets captured this page: 500).
#> Total tweets captured now reach 1000 : finishing collection.
end_time <- Sys.time()
end_time - start_time
#> Time difference of 4.990046 secs
nrow(x)
#> [1] 1000
Created on 2021-07-04 by the reprex package (v2.0.0)
If you need those context annotations, you need to specify it explicitly in your call to get_all_tweets
. It will be slower also.
require(academictwitteR)
#> Loading required package: academictwitteR
start_time <- Sys.time()
x <- get_all_tweets(
query = "#ichbinhanna",
start_tweets = "2021-01-01T00:00:00Z",
end_tweets = "2021-07-01T00:00:00Z",
n = 1000,
context_annotations = TRUE
)
#> Warning: Recommended to specify a data path in order to mitigate data loss when
#> ingesting large amounts of data.
#> Warning: Tweets will not be stored as JSONs or as a .rds file and will only be
#> available in local memory if assigned to an object.
#> page_n is limited to 100 due to the restriction imposed by Twitter API
#> query: #ichbinhanna
#> Total pages queried: 1 (tweets captured this page: 100).
#> Total pages queried: 2 (tweets captured this page: 100).
#> Total pages queried: 3 (tweets captured this page: 100).
#> Total pages queried: 4 (tweets captured this page: 100).
#> Total pages queried: 5 (tweets captured this page: 100).
#> Total pages queried: 6 (tweets captured this page: 100).
#> Total pages queried: 7 (tweets captured this page: 100).
#> Total pages queried: 8 (tweets captured this page: 100).
#> Total pages queried: 9 (tweets captured this page: 100).
#> Total pages queried: 10 (tweets captured this page: 100).
#> Total tweets captured now reach 1000 : finishing collection.
end_time <- Sys.time()
end_time - start_time
#> Time difference of 11.94927 secs
nrow(x)
#> [1] 1000
Created on 2021-07-04 by the reprex package (v2.0.0)
Thanks @chainsawriot adding the n=Inf worked. I’m getting closer to 250k tweets now which was what I was getting a few days ago. Clearly the syntax has changed in the code. Much appreciated!
@chainsawriot hey quiet Beatle - great answer and thanks for this tip!
You needa tune the
n
.
Got that n tuned finely now! Many thanks @justinchuntingho for the speedy fix!
Many thanks guys for fixing this. @justinchuntingho @chainsawriot you guys are amazing.
Hi @chainsawriot, I still have a problem with the status code: 400. Below is my code. Can you please tell me what I did wrong? I tried to add page_n=500 but it did not work. page_n = 100 worked, but I noticed that it took longer than a few days ago when the update had not happened yet. tweets <- get_all_tweets("paris accord", "2018-07-01T00:00:00Z", "2021-07-04T00:00:00Z", BEARER_TOKEN, lang = "en") (this one did not work) I am a newbie so I am sorry if my question is not good.
Hi @chainsawriot, I still have a problem with the status code: 400. Below is my code. Can you please tell me what I did wrong? I tried to add page_n=500 but it did not work. page_n = 100 worked, but I noticed that it took longer than a few days ago when the update had not happened yet. tweets <- get_all_tweets("paris accord", "2018-07-01T00:00:00Z", "2021-07-04T00:00:00Z", BEARER_TOKEN, lang = "en") (this one did not work) I am a newbie so I am sorry if my question is not good.
Hi @helennguyen1312 ,
You need to update the package. It is not yet push to CRAN, but you can install the dev version as shown below.
devtools::install_github("cjbarrie/academictwitteR", build_vignettes = TRUE)
This is is what works for me.
Best, Shamsuddeen
@shmuhammad2004 Thank you so much! I got it now. And many thanks to @justinchuntingho @chainsawriot for fixing the issue.
Hi @chainsawriot sorry to bother you. I'm still having issues in getting tweets.
Firstly I create bearer_token and query objects. The query is the following:
query <- build_query( query = "blabla", is_retweet = FALSE, has_hashtags = TRUE, remove_promoted = TRUE)
Then, I try to get tweets with the following command:
try <- get_all_tweets( query = query, bearer_token, file = NULL, data_path = NULL, bind_tweets = TRUE, start_tweets = "2021-06-11T00:00:00Z", end_tweets = "2021-07-04T23:59:59Z", verbose= FALSE)
Such a command does not work. The error is the following
Errore in make_query(url = endpoint_url, params = params, bearer_token = bearer_token, : something went wrong. Status code: 400
I tried to introduce context_annotations = TRUE
, I tried to introduce n =
. In both cases, the command does not work (it does not recognize context_annotations
as valid argument).
The command works only with page_n = 100
. Yet, I need to scrape many more tweets. How can I solve this? Any tip?
Thank you all in advance for your great work and support.
@AndreaaMarche Have you installed the latest Github version?
devtools::install_github("cjbarrie/academictwitteR", build_vignettes = TRUE)
Can't reproduce your error.
require(academictwitteR)
#> Loading required package: academictwitteR
query <- build_query( query = "blabla", is_retweet = FALSE, has_hashtags = TRUE,
remove_promoted = TRUE)
try <- get_all_tweets( query = query, file = NULL, data_path = NULL,
bind_tweets = TRUE, start_tweets = "2021-06-11T00:00:00Z",
end_tweets = "2021-07-04T23:59:59Z", verbose= FALSE, n = 2000)
nrow(try)
#> [1] 2000
Created on 2021-07-05 by the reprex package (v2.0.0)
@chainsawriot It works if I do not specify bearer_token
in the command. I do not understand why, but this was the issue in my case, and I had to use set_bearer
: maybe it can be helpful for other users.
If possible, I would like to know the maximum n =
I can specify. Thank you very much for your help!
@AndreaaMarche Study count_all_tweets
Hi @chainsawriot sorry to bother you. I'm still having issues in getting tweets. Firstly I create bearer_token and query objects. The query is the following:
query <- build_query( query = "blabla", is_retweet = FALSE, has_hashtags = TRUE, remove_promoted = TRUE)
Then, I try to get tweets with the following command:
try <- get_all_tweets( query = query, bearer_token, file = NULL, data_path = NULL, bind_tweets = TRUE, start_tweets = "2021-06-11T00:00:00Z", end_tweets = "2021-07-04T23:59:59Z", verbose= FALSE)
Such a command does not work. The error is the following
Errore in make_query(url = endpoint_url, params = params, bearer_token = bearer_token, : something went wrong. Status code: 400
I tried to introduce
context_annotations = TRUE
, I tried to introducen =
. In both cases, the command does not work (it does not recognizecontext_annotations
as valid argument).The command works only with
page_n = 100
. Yet, I need to scrape many more tweets. How can I solve this? Any tip?Thank you all in advance for your great work and support.
Hi @chainsawriot! I'm having the same exact issue @AndreaaMarche had, but her solution is not working for me, as I never specified bearer token
in the command to begin with. My query is as follows:
hillary_tweets <- get_all_tweets(users = c("HillaryClinton"), start_tweets = "2015-04-12T00:00:00Z", end_tweets = "2016-06-06T00:00:00Z", bind_tweets = TRUE, page_n = 500, n = Inf)
This gives me the 400 error:
Error in make_query(url = endpoint_url, params = params, bearer_token = bearer_token, : something went wrong. Status code: 400
I've installed the latest dev version of the package, but like @AndreaaMarche I can't introduce context_annotations = FALSE
or n =
without getting errors. I can only get it to work with page_n = 100
, which quickly exceeds the rate limit. Any suggestions?
Thanks so much!
@kobihackenburg can't reproduce
require(academictwitteR)
#> Loading required package: academictwitteR
hillary_tweets <- get_all_tweets(users = c("HillaryClinton"), start_tweets = "2015-04-12T00:00:00Z", end_tweets = "2016-06-06T00:00:00Z", bind_tweets = TRUE, page_n = 500, n = Inf)
#> Warning: Recommended to specify a data path in order to mitigate data loss when
#> ingesting large amounts of data.
#> Warning: Tweets will not be stored as JSONs or as a .rds file and will only be
#> available in local memory if assigned to an object.
#> query: (from:HillaryClinton)
#> Total pages queried: 1 (tweets captured this page: 496).
#> Total pages queried: 2 (tweets captured this page: 500).
#> Total pages queried: 3 (tweets captured this page: 499).
#> Total pages queried: 4 (tweets captured this page: 496).
#> Total pages queried: 5 (tweets captured this page: 486).
#> Total pages queried: 6 (tweets captured this page: 494).
#> Total pages queried: 7 (tweets captured this page: 494).
#> Total pages queried: 8 (tweets captured this page: 500).
#> Total pages queried: 9 (tweets captured this page: 491).
#> Total pages queried: 10 (tweets captured this page: 498).
#> Total pages queried: 11 (tweets captured this page: 497).
#> Total pages queried: 12 (tweets captured this page: 430).
#> This is the last page for (from:HillaryClinton) : finishing collection.
Created on 2021-07-06 by the reprex package (v2.0.0)
I am using 0.2.1 a.k.a. the current Github version.
Hi @chainsawriot sorry to bother you. I'm still having issues in getting tweets. Firstly I create bearer_token and query objects. The query is the following:
query <- build_query( query = "blabla", is_retweet = FALSE, has_hashtags = TRUE, remove_promoted = TRUE)
Then, I try to get tweets with the following command:
try <- get_all_tweets( query = query, bearer_token, file = NULL, data_path = NULL, bind_tweets = TRUE, start_tweets = "2021-06-11T00:00:00Z", end_tweets = "2021-07-04T23:59:59Z", verbose= FALSE)
Such a command does not work. The error is the following
Errore in make_query(url = endpoint_url, params = params, bearer_token = bearer_token, : something went wrong. Status code: 400
I tried to introduce
context_annotations = TRUE
, I tried to introducen =
. In both cases, the command does not work (it does not recognizecontext_annotations
as valid argument).The command works only with
page_n = 100
. Yet, I need to scrape many more tweets. How can I solve this? Any tip?Thank you all in advance for your great work and support.
If you are supplying the arguments in the order they were defined, you need to name them, eg you need to state explicitly bearer_token = bearer_token
(recommended), or put your arguments in the order they were defined get_all_tweets(query, start_tweets, end_tweets, bearer_token, ...)
.
Patch v0.2.1 now on CRAN: ref. commit 49d0c7e
I run the code below to extract tweets with hashtag #BlackLivesMatter. But, it returns an error
Error in make_query(url = endpoint_url, params = params, bearer_token = bearer_token, : something went wrong. Status code: 400
.I understand error
400
means bad request but the query is a verbatim copy from academictwitteR.Expected behavior
Return the expected tweets as queried.
Session Info:
Thanks @cjbarrie for the amazing work.
Please, kindly advised.
Best, Shamsuddeen