cjbarrie / academictwitteR

Repo for academictwitteR package to query the Twitter Academic Research Product Track v2 API endpoint.
Other
272 stars 59 forks source link

Error in if (httr::headers(r)$`x-rate-limit-remaining` == "1") #192

Closed chainsawriot closed 3 years ago

chainsawriot commented 3 years ago

Discussed in https://github.com/cjbarrie/academictwitteR/discussions/191

Originally posted by **shmuhammad2004** July 11, 2021 Hi, @cjbarrie @justinchuntingho @chainsawriot Thank you for the good work. I am collecting tweets using `get_all_tweets()`. It works fine, but when I am collecting tweets for example 10years, the collectionon breaks at some point and it returns an error : `Error in if (httr::headers(r)$`x-rate-limit-remaining` == "1") { : argument is of length zero`. This behavior is persistent for a week now. Below is my code: ``` get_all_tweets( user = c("bbchausa", "voahausa" , "freedomradionig", "aminiyatrust", "RFI_Ha", "hausapedi", "AREWA24Channel", "AminuSaira", "FalaluDorayi", "sairamovies", "Hausafilmsnews", "alinuhu", "Ali_Nuhu_Fans ", " KannywoodEmp", "princeazango", "KANNYGOSSIPS", "Alan_waka","KannywoodRadio", "washafatii", "Rahma_sadau"), start_tweets = "2007-01-01T00:00:00Z", end_tweets = "2021-07-08T00:00:00Z", n = Inf, data_path = "hausa_user_full", bind_tweets = FALSE, export_query = TRUE, is_retweet = FALSE, is_quote = FALSE, bearer_token = get_bearer() ) ``` I also tried using `resume_collection`, using `resume_collection(data_path = "hausa_user_full")` , but it always return the error below: ``` Error in make_query(url = endpoint_url, params = params, bearer_token = bearer_token, : something went wrong. Status code: 400 In addition: Warning messages: 1: Tweets will be bound in local memory as well as stored as JSONs. 2: Directory already exists. Existing JSON files may be parsed and returned, choose a new path if this is not intended. ``` The problem with `resume_collection()` is also across all tweets that I want to update. It reports en error as shown above.
chainsawriot commented 3 years ago

Let me triage this issue.

For the first part, I can confirm (or at least have the same experience) that this is a random error. I mean, Twitter sometimes returns an HTTP response without x-rate-limit-remaining in the header. It happens randomly, AFAICT. I have tried for weeks to capture (with httptest) an HTTP response like so for testing, but to no avail. I am still catching the spectre released by Twitter, so stay tuned.

So we need to address this prophylactically. The check in the code is not absolutely essential.

The 2nd part reported by @shmuhammad2004 is interesting. If possible, @shmuhammad2004 could you copy and paste the content of the query file in the directory?

chainsawriot commented 3 years ago

At the moment without any fix, please try to keep your queries small. The code is well-tested (believe it or not), but it is difficult to stress test the code due to the very nature of the fact that this package is depending on Twitter API (if you want to know the nitty-gritty details about testing this package, read this post).

DrorWalt commented 3 years ago

Same error here for multiple searches. The latest for fun was:

get_all_tweets("mtgreenee", start_tweets = "2020-06-01T00:00:00Z", end_tweets = "2020-06-02T00:00:00Z", bearer_token=bearer_token, data_path = mypath, n=Inf, bind_tweets = F)

query: mtgreenee Error in if (httr::headers(r)$x-rate-limit-remaining == "1") { : argument is of length zero

Similarly, this one works:

get_all_tweets("#cpac", start_tweets = "2021-06-01T00:00:00Z", end_tweets = "2021-06-02T00:00:00Z", bearer_token=bearer_token, data_path = mypath, n=Inf, bind_tweets = F)

But this one doesn't:

get_all_tweets("#cpac", start_tweets = "2021-06-01T00:00:00Z", end_tweets = "2021-06-03T00:00:00Z", bearer_token=bearer_token, data_path = mypath, n=Inf, bind_tweets = F)

DrorWalt commented 3 years ago

Update, Python is down too (It wasn't during last week's error storm with this package) and is returning 503 error.

cjbarrie commented 3 years ago

I was unable to reproduce the error reported by @DrorWalt; all of these queries worked fine for me. Could you let us know if this persists for you?

chainsawriot commented 3 years ago

@DrorWalt Can't reproduce. As said, the error appears to be random.

require(academictwitteR)
#> Loading required package: academictwitteR
mypath <- academictwitteR:::.gen_random_dir()
get_all_tweets("mtgreenee",
start_tweets = "2020-06-01T00:00:00Z",
end_tweets = "2020-06-02T00:00:00Z",
bearer_token = get_bearer(),
data_path = mypath,
n = Inf,
bind_tweets = FALSE)
#> query:  mtgreenee 
#> Total pages queried: 1 (tweets captured this page: 496).
#> Total pages queried: 2 (tweets captured this page: 276).
#> This is the last page for mtgreenee : finishing collection.
#> Data stored as JSONs: use bind_tweets function to bundle into data.frame
unlink(mypath, recursive = TRUE)

mypath <- academictwitteR:::.gen_random_dir()
get_all_tweets("#cpac",
start_tweets = "2021-06-01T00:00:00Z",
end_tweets = "2021-06-02T00:00:00Z",
bearer_token = get_bearer(),
data_path = mypath,
n = Inf,
bind_tweets = F)
#> query:  #cpac 
#> Total pages queried: 1 (tweets captured this page: 88).
#> This is the last page for #cpac : finishing collection.
#> Data stored as JSONs: use bind_tweets function to bundle into data.frame
unlink(mypath, recursive = TRUE)

mypath <- academictwitteR:::.gen_random_dir()
get_all_tweets("#cpac",
start_tweets = "2021-06-01T00:00:00Z",
end_tweets = "2021-06-03T00:00:00Z",
bearer_token= get_bearer(),
data_path = mypath,
n = Inf,
bind_tweets = F)
#> query:  #cpac 
#> Total pages queried: 1 (tweets captured this page: 193).
#> This is the last page for #cpac : finishing collection.
#> Data stored as JSONs: use bind_tweets function to bundle into data.frame
unlink(mypath, recursive = TRUE)

Created on 2021-07-12 by the reprex package (v2.0.0)

DrorWalt commented 3 years ago

Working now.

shmuhammadd commented 3 years ago

@chainsawriot thanks for the great work. All issues are resolved. resume_collection() also works fine.

I am very grateful for your efforts all the team involved.

Thank you very much.

PradeepNalluri commented 3 years ago

Hi @shmuhammad2004 @chainsawriot, @cjbarrie, I am trying to fetch ~8Mn tweets. I am facing the same issue. Error in if (httr::headers(r)$x-rate-limit-remaining== "1") { : argument is of length zero Calls: get_all_tweets -> get_tweets -> make_query In addition: Warning message: Tweets will be bound in local memory as well as stored as JSON. I understand from the comments that this is a problem at the Twitter API end. Can you folks please let me know what are the things I could try to get this sorted? Also, a small note this happened to me twice first after querying 2275 pages, and the second instance was after 1745 pages. Thank you.