cjbarrie / academictwitteR

Repo for academictwitteR package to query the Twitter Academic Research Product Track v2 API endpoint.
Other
272 stars 59 forks source link

Error in make_query. Status code: 400 #180

Closed shmuhammadd closed 3 years ago

shmuhammadd commented 3 years ago

I run the code below to extract tweets with hashtag #BlackLivesMatter. But, it returns an error Error in make_query(url = endpoint_url, params = params, bearer_token = bearer_token, : something went wrong. Status code: 400.

I understand error 400 means bad request but the query is a verbatim copy from academictwitteR.

get_all_tweets(
    query = "#BlackLivesMatter",
    start_tweets = "2020-01-01T00:00:00Z",
    end_tweets = "2020-01-05T00:00:00Z",
    file = "blmtweets",
    data_path = "data/",
    n = 100,
    bearer_token = get_bearer()
  )

Expected behavior

Return the expected tweets as queried.

Session Info:


R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics 
[3] grDevices utils    
[5] datasets  methods  
[7] base     

other attached packages:
 [1] quanteda.textstats_0.94.1
 [2] quanteda.tidy_0.2        
 [3] quanteda_3.0.0           
 [4] forcats_0.5.1            
 [5] stringr_1.4.0            
 [6] dplyr_1.0.7              
 [7] purrr_0.3.4              
 [8] readr_1.4.0              
 [9] tidyr_1.1.3              
[10] tibble_3.1.2             
[11] ggplot2_3.3.4.9000       
[12] tidyverse_1.3.1          
[13] academictwitteR_0.2.0    
[14] goodshirt_0.2.2          

loaded via a namespace (and not attached):
 [1] rmsfact_0.0.3             
 [2] Rcpp_1.0.6                
 [3] stringdist_0.9.6.3        
 [4] lubridate_1.7.10          
 [5] lattice_0.20-44           
 [6] LexisNexisTools_0.3.4.9000
 [7] assertthat_0.2.1          
 [8] utf8_1.2.1                
 [9] cellranger_1.1.0          
[10] R6_2.5.0                  
[11] plyr_1.8.6                
[12] backports_1.2.1           
[13] reprex_2.0.0              
[14] nsyllable_1.0             
[15] httr_1.4.2                
[16] pillar_1.6.1              
[17] rlang_0.4.11              
[18] readxl_1.3.1              
[19] curl_4.3.2                
[20] rstudioapi_0.13           
[21] data.table_1.14.0         
[22] praise_1.0.0              
[23] Matrix_1.3-4              
[24] munsell_0.5.0             
[25] broom_0.7.8               
[26] modelr_0.1.8              
[27] compiler_4.0.3            
[28] xfun_0.24                 
[29] pkgconfig_2.0.3           
[30] tidyselect_1.1.1          
[31] emo_0.0.0.9000            
[32] fansi_0.5.0               
[33] withr_2.4.2               
[34] crayon_1.4.1              
[35] dbplyr_2.1.1              
[36] grid_4.0.3                
[37] jsonlite_1.7.2            
[38] gtable_0.3.0              
[39] lifecycle_1.0.0           
[40] DBI_1.1.1                 
[41] magrittr_2.0.1            
[42] scales_1.1.1              
[43] RcppParallel_5.1.4        
[44] cli_2.5.0                 
[45] stringi_1.6.2             
[46] pbapply_1.4-3             
[47] reshape2_1.4.4            
[48] fs_1.5.0                  
[49] cowsay_0.8.0              
[50] xml2_1.3.2                
[51] ellipsis_0.3.2            
[52] stopwords_2.2             
[53] fortunes_1.5-4            
[54] generics_0.1.0            
[55] vctrs_0.3.8               
[56] fastmatch_1.1-0           
[57] tools_4.0.3               
[58] glue_1.4.2                
[59] hms_1.1.0                 
[60] parallel_4.0.3            
[61] colorspace_2.0-2          
[62] rvest_1.0.0               
[63] haven_2.4.1               
[64] knitr_1.33                
[65] usethis_2.0.1.9000 

Thanks @cjbarrie for the amazing work.

Please, kindly advised.

Best, Shamsuddeen

hsakareem commented 3 years ago

I am getting the same error. In my case, it was working perfectly till yesterday. I'm getting this error from this evening. Might be a problem at Twitter's end.

DrorWalt commented 3 years ago

Same issue here. Python works with no problem though and same bearer.

shmuhammadd commented 3 years ago

I am getting the same error. In my case, it was working perfectly till yesterday. I'm getting this error from this evening. Might be a problem at Twitter's end.

Works fine for me also yesterday.

justinchuntingho commented 3 years ago

Twitter just changed its API a few days ago, if a user requests context_annotations with the tweet.fields parameter (by default on), the fetch will be limited to 100 tweets per page (by default 500, hence the error). A quick workaround would be to add page_n = 100. We are working on a fix and will update soon.

shmuhammadd commented 3 years ago

Twitter just changed its API a few days ago, if a user requests context_annotations with the tweet.fields parameter (by default on), the fetch will be limited to 100 tweets per page (by default 500). A quick workaround would be to add page_n = 100. We are working on a fix and will update soon.

Thanks for the response @justinchuntingho.

jmwright432 commented 3 years ago

I'm having an issue on this front. I ran this code about 4-5 days ago with no issue--I was getting tweets scraped of upwards of 250,000 which was fantastic. Now I am getting this 400 error message and using page_n =100 obviously limits me to 100 tweets per page and maxing out my tweets at 100. Is there a workaround for this or is this package now limiting to that few of tweets?

justinchuntingho commented 3 years ago

Starting from #181, you should now be able to specify context_annotations = FALSE (also the default), in this case you will be able to fetch 500 tweets per page. We will try to push the patch to CRAN soon but at the mean time you could install the development version to use this.

jmwright432 commented 3 years ago

This is the message I get with the following code:

tweets4 <- get_all_tweets(query=build_query("sanctuary cities OR sanctuary city",is_retweet=FALSE,lang="en"),start_tweets="2018-01-01T00:00:00Z",end_tweets="2018-01-03T00:00:00Z", bearer_token=bearer_token, data_path = "data6/", bind_tweets = TRUE, context_annotations=FALSE, page_n=500) query: sanctuary cities OR sanctuary city -is:retweet lang:en Total pages queried: 1 (tweets captured this page: 496). Total tweets captured now reach 100 : finishing collection.

chainsawriot commented 3 years ago

@jmwright432 How about

city",is_retweet=FALSE,lang="en"),start_tweets="2018-01-01T00:00:00Z",end_tweets="2018-01-03T00:00:00Z", bearer_token=bearer_token, data_path = "data6/", bind_tweets = TRUE, context_annotations=FALSE, page_n=500, n = Inf)

You needa tune the n.

natesheehan commented 3 years ago

Is there a workaround for this or is this package now limiting to that few of tweets?

From what I understand about the new twitter update and this package, you should still be able to mine > 100 tweets , it will just be much slower if you want context_annotations until an update @justinchuntingho ?

chainsawriot commented 3 years ago

I am not @justinchuntingho (I'm the quiet Beatle). But I can answer your question @natesheehan.

The update is there, now. You can install the Github version.

First thing first, you can get more than 100 tweets. You can get 1000 tweets in 5s, for example. The only change is that you won't get the context annotations, the things (e.g. topics, name entities) that Twitter extracted for you from tweets.

require(academictwitteR)
#> Loading required package: academictwitteR

start_time <- Sys.time()
x <- get_all_tweets(
  query = "#ichbinhanna",
  start_tweets = "2021-01-01T00:00:00Z",
  end_tweets = "2021-07-01T00:00:00Z",
  n = 1000
)
#> Warning: Recommended to specify a data path in order to mitigate data loss when
#> ingesting large amounts of data.
#> Warning: Tweets will not be stored as JSONs or as a .rds file and will only be
#> available in local memory if assigned to an object.
#> query:  #ichbinhanna 
#> Total pages queried: 1 (tweets captured this page: 500).
#> Total pages queried: 2 (tweets captured this page: 500).
#> Total tweets captured now reach 1000 : finishing collection.
end_time <- Sys.time()
end_time - start_time
#> Time difference of 4.990046 secs
nrow(x)
#> [1] 1000

Created on 2021-07-04 by the reprex package (v2.0.0)

If you need those context annotations, you need to specify it explicitly in your call to get_all_tweets. It will be slower also.

require(academictwitteR)
#> Loading required package: academictwitteR

start_time <- Sys.time()
x <- get_all_tweets(
  query = "#ichbinhanna",
  start_tweets = "2021-01-01T00:00:00Z",
  end_tweets = "2021-07-01T00:00:00Z",
  n = 1000,
  context_annotations = TRUE
)
#> Warning: Recommended to specify a data path in order to mitigate data loss when
#> ingesting large amounts of data.
#> Warning: Tweets will not be stored as JSONs or as a .rds file and will only be
#> available in local memory if assigned to an object.
#> page_n is limited to 100 due to the restriction imposed by Twitter API
#> query:  #ichbinhanna 
#> Total pages queried: 1 (tweets captured this page: 100).
#> Total pages queried: 2 (tweets captured this page: 100).
#> Total pages queried: 3 (tweets captured this page: 100).
#> Total pages queried: 4 (tweets captured this page: 100).
#> Total pages queried: 5 (tweets captured this page: 100).
#> Total pages queried: 6 (tweets captured this page: 100).
#> Total pages queried: 7 (tweets captured this page: 100).
#> Total pages queried: 8 (tweets captured this page: 100).
#> Total pages queried: 9 (tweets captured this page: 100).
#> Total pages queried: 10 (tweets captured this page: 100).
#> Total tweets captured now reach 1000 : finishing collection.
end_time <- Sys.time()
end_time - start_time
#> Time difference of 11.94927 secs
nrow(x)
#> [1] 1000

Created on 2021-07-04 by the reprex package (v2.0.0)

jmwright432 commented 3 years ago

Thanks @chainsawriot adding the n=Inf worked. I’m getting closer to 250k tweets now which was what I was getting a few days ago. Clearly the syntax has changed in the code. Much appreciated!

natesheehan commented 3 years ago

@chainsawriot hey quiet Beatle - great answer and thanks for this tip!

You needa tune the n.

Got that n tuned finely now! Many thanks @justinchuntingho for the speedy fix!

shmuhammadd commented 3 years ago

Many thanks guys for fixing this. @justinchuntingho @chainsawriot you guys are amazing.

helennguyen1312 commented 3 years ago

Hi @chainsawriot, I still have a problem with the status code: 400. Below is my code. Can you please tell me what I did wrong? I tried to add page_n=500 but it did not work. page_n = 100 worked, but I noticed that it took longer than a few days ago when the update had not happened yet. tweets <- get_all_tweets("paris accord", "2018-07-01T00:00:00Z", "2021-07-04T00:00:00Z", BEARER_TOKEN, lang = "en") (this one did not work) I am a newbie so I am sorry if my question is not good.

shmuhammadd commented 3 years ago

Hi @chainsawriot, I still have a problem with the status code: 400. Below is my code. Can you please tell me what I did wrong? I tried to add page_n=500 but it did not work. page_n = 100 worked, but I noticed that it took longer than a few days ago when the update had not happened yet. tweets <- get_all_tweets("paris accord", "2018-07-01T00:00:00Z", "2021-07-04T00:00:00Z", BEARER_TOKEN, lang = "en") (this one did not work) I am a newbie so I am sorry if my question is not good.

Hi @helennguyen1312 ,

You need to update the package. It is not yet push to CRAN, but you can install the dev version as shown below.

devtools::install_github("cjbarrie/academictwitteR", build_vignettes = TRUE)

This is is what works for me.

Best, Shamsuddeen

helennguyen1312 commented 3 years ago

@shmuhammad2004 Thank you so much! I got it now. And many thanks to @justinchuntingho @chainsawriot for fixing the issue.

AndreaaMarche commented 3 years ago

Hi @chainsawriot sorry to bother you. I'm still having issues in getting tweets. Firstly I create bearer_token and query objects. The query is the following: query <- build_query( query = "blabla", is_retweet = FALSE, has_hashtags = TRUE, remove_promoted = TRUE)

Then, I try to get tweets with the following command: try <- get_all_tweets( query = query, bearer_token, file = NULL, data_path = NULL, bind_tweets = TRUE, start_tweets = "2021-06-11T00:00:00Z", end_tweets = "2021-07-04T23:59:59Z", verbose= FALSE)

Such a command does not work. The error is the following Errore in make_query(url = endpoint_url, params = params, bearer_token = bearer_token, : something went wrong. Status code: 400

I tried to introduce context_annotations = TRUE, I tried to introduce n =. In both cases, the command does not work (it does not recognize context_annotations as valid argument).

The command works only with page_n = 100. Yet, I need to scrape many more tweets. How can I solve this? Any tip?

Thank you all in advance for your great work and support.

chainsawriot commented 3 years ago

@AndreaaMarche Have you installed the latest Github version?

devtools::install_github("cjbarrie/academictwitteR", build_vignettes = TRUE) 

Can't reproduce your error.

require(academictwitteR)
#> Loading required package: academictwitteR
query <- build_query( query = "blabla", is_retweet = FALSE, has_hashtags = TRUE, 
                      remove_promoted = TRUE)

try <- get_all_tweets( query = query, file = NULL, data_path = NULL, 
                       bind_tweets = TRUE, start_tweets = "2021-06-11T00:00:00Z", 
                       end_tweets = "2021-07-04T23:59:59Z", verbose= FALSE, n = 2000)
nrow(try)
#> [1] 2000

Created on 2021-07-05 by the reprex package (v2.0.0)

AndreaaMarche commented 3 years ago

@chainsawriot It works if I do not specify bearer_token in the command. I do not understand why, but this was the issue in my case, and I had to use set_bearer: maybe it can be helpful for other users.

If possible, I would like to know the maximum n = I can specify. Thank you very much for your help!

chainsawriot commented 3 years ago

@AndreaaMarche Study count_all_tweets

kobihackenburg commented 3 years ago

Hi @chainsawriot sorry to bother you. I'm still having issues in getting tweets. Firstly I create bearer_token and query objects. The query is the following: query <- build_query( query = "blabla", is_retweet = FALSE, has_hashtags = TRUE, remove_promoted = TRUE)

Then, I try to get tweets with the following command: try <- get_all_tweets( query = query, bearer_token, file = NULL, data_path = NULL, bind_tweets = TRUE, start_tweets = "2021-06-11T00:00:00Z", end_tweets = "2021-07-04T23:59:59Z", verbose= FALSE)

Such a command does not work. The error is the following Errore in make_query(url = endpoint_url, params = params, bearer_token = bearer_token, : something went wrong. Status code: 400

I tried to introduce context_annotations = TRUE, I tried to introduce n =. In both cases, the command does not work (it does not recognize context_annotations as valid argument).

The command works only with page_n = 100. Yet, I need to scrape many more tweets. How can I solve this? Any tip?

Thank you all in advance for your great work and support.

Hi @chainsawriot! I'm having the same exact issue @AndreaaMarche had, but her solution is not working for me, as I never specified bearer token in the command to begin with. My query is as follows:

hillary_tweets <- get_all_tweets(users = c("HillaryClinton"), start_tweets = "2015-04-12T00:00:00Z", end_tweets = "2016-06-06T00:00:00Z", bind_tweets = TRUE, page_n = 500, n = Inf)

This gives me the 400 error:

Error in make_query(url = endpoint_url, params = params, bearer_token = bearer_token, : something went wrong. Status code: 400

I've installed the latest dev version of the package, but like @AndreaaMarche I can't introduce context_annotations = FALSE or n = without getting errors. I can only get it to work with page_n = 100, which quickly exceeds the rate limit. Any suggestions?

Thanks so much!

chainsawriot commented 3 years ago

@kobihackenburg can't reproduce

require(academictwitteR)
#> Loading required package: academictwitteR
hillary_tweets <- get_all_tweets(users = c("HillaryClinton"), start_tweets = "2015-04-12T00:00:00Z", end_tweets = "2016-06-06T00:00:00Z", bind_tweets = TRUE, page_n = 500, n = Inf)
#> Warning: Recommended to specify a data path in order to mitigate data loss when
#> ingesting large amounts of data.
#> Warning: Tweets will not be stored as JSONs or as a .rds file and will only be
#> available in local memory if assigned to an object.
#> query:   (from:HillaryClinton) 
#> Total pages queried: 1 (tweets captured this page: 496).
#> Total pages queried: 2 (tweets captured this page: 500).
#> Total pages queried: 3 (tweets captured this page: 499).
#> Total pages queried: 4 (tweets captured this page: 496).
#> Total pages queried: 5 (tweets captured this page: 486).
#> Total pages queried: 6 (tweets captured this page: 494).
#> Total pages queried: 7 (tweets captured this page: 494).
#> Total pages queried: 8 (tweets captured this page: 500).
#> Total pages queried: 9 (tweets captured this page: 491).
#> Total pages queried: 10 (tweets captured this page: 498).
#> Total pages queried: 11 (tweets captured this page: 497).
#> Total pages queried: 12 (tweets captured this page: 430).
#> This is the last page for  (from:HillaryClinton) : finishing collection.

Created on 2021-07-06 by the reprex package (v2.0.0)

I am using 0.2.1 a.k.a. the current Github version.

justinchuntingho commented 3 years ago

Hi @chainsawriot sorry to bother you. I'm still having issues in getting tweets. Firstly I create bearer_token and query objects. The query is the following: query <- build_query( query = "blabla", is_retweet = FALSE, has_hashtags = TRUE, remove_promoted = TRUE)

Then, I try to get tweets with the following command: try <- get_all_tweets( query = query, bearer_token, file = NULL, data_path = NULL, bind_tweets = TRUE, start_tweets = "2021-06-11T00:00:00Z", end_tweets = "2021-07-04T23:59:59Z", verbose= FALSE)

Such a command does not work. The error is the following Errore in make_query(url = endpoint_url, params = params, bearer_token = bearer_token, : something went wrong. Status code: 400

I tried to introduce context_annotations = TRUE, I tried to introduce n =. In both cases, the command does not work (it does not recognize context_annotations as valid argument).

The command works only with page_n = 100. Yet, I need to scrape many more tweets. How can I solve this? Any tip?

Thank you all in advance for your great work and support.

If you are supplying the arguments in the order they were defined, you need to name them, eg you need to state explicitly bearer_token = bearer_token (recommended), or put your arguments in the order they were defined get_all_tweets(query, start_tweets, end_tweets, bearer_token, ...).

cjbarrie commented 3 years ago

Patch v0.2.1 now on CRAN: ref. commit 49d0c7e