Closed ophiryotam closed 3 years ago
@ophiryotam
@cjbarrie Maybe we should add this to the README or whatnot.
@chainsawriot yep I'll get on this. I feel like this could also be a case where we could add an argument forexact_phrase
or something too, which would coerce the character vector into escape quotes. I know we haven't included arguments like this for other similar cases, but I think this is a bit different from problems re misunderstandings of AND and OR logics. Plus, escape quotes are ugly and error prone and it'd be nice to hide them under the hood!
@chainsawriot yep I'll get on this. I feel like this could also be a case where we could add an argument for
exact_phrase
or something too, which would coerce the character vector into escape quotes. I know we haven't included arguments like this for other similar cases, but I think this is a bit different from problems re misunderstandings of AND and OR logics. Plus, escape quotes are ugly and error prone and it'd be nice to hide them under the hood!
Have now done this in #242, which adds exact_phrase
parameter
exact_phrase
parameter@cjbarrie Thanks for implementing the feature. I am in the process of writing tests for your added feature and I am afraid the feature is not well-tested.
Let's say a slightly more advanced example in the README. Suppose I want to search for the exact phrase of "Black Lives Matter" and those retweeted from "@ACLU". The current implementation will generate this query: \"Black Lives Matter (retweets_of:ACLU)\"
and (retweets_of:ACLU)
is part of the "exact phrase". This query will surely give no result.
The problem is from the order of when "exact_phrase" is treated in build_query
. I believe it should be the first, not the fifth. But please let me know how you think, maybe I have misunderstood something.
require(academictwitteR)
#> Loading required package: academictwitteR
tweets1 <-
get_all_tweets(
query = "Black Lives Matter",
retweets_of = "ACLU",
exact_phrase = TRUE, start_tweets = "2021-01-04T00:00:00Z",
end_tweets = "2021-01-04T00:45:00Z",
n = Inf)
#> Warning: Recommended to specify a data path in order to mitigate data loss when
#> ingesting large amounts of data.
#> Warning: Tweets will not be stored as JSONs or as a .rds file and will only be
#> available in local memory if assigned to an object.
#> query: "Black Lives Matter (retweets_of:ACLU)"
#> Total pages queried: 1 (tweets captured this page: 0).
#> This is the last page for "Black Lives Matter (retweets_of:ACLU)" : finishing collection.
tweets2 <- get_all_tweets(query = "\"Black Lives Matter\"", retweets_of = "ACLU",
start_tweets = "2021-01-04T00:00:00Z", end_tweets = "2021-01-04T00:45:00Z",
n = Inf)
#> Warning: Recommended to specify a data path in order to mitigate data loss when
#> ingesting large amounts of data.
#> Warning: Tweets will not be stored as JSONs or as a .rds file and will only be
#> available in local memory if assigned to an object.
#> query: "Black Lives Matter" (retweets_of:ACLU)
#> Total pages queried: 1 (tweets captured this page: 110).
#> This is the last page for "Black Lives Matter" (retweets_of:ACLU) : finishing collection.
nrow(tweets1)
#> [1] 0
nrow(tweets2)
#> [1] 110
testthat::expect_true(nrow(tweets1) > 0)
#> Error: nrow(tweets1) > 0 is not TRUE
#>
#> `actual`: FALSE
#> `expected`: TRUE
testthat::expect_true(nrow(tweets2) > 0)
build_query(query = "Black Lives Matter", exact_phrase = TRUE, retweets_of = "ACLU")
#> [1] "\"Black Lives Matter (retweets_of:ACLU)\""
Created on 2021-10-17 by the reprex package (v2.0.0)
@chainsawriot you are right. Thank you for spotting this. I have implemented a change in #247
Please confirm the following
something went wrong. Status code: 400.
Describe the bug
Hi there, When searching for a query with multiple words, wanting the exact phrase, I get all tweets containing any of the words. So when running the following I get tweets containing only the word "goals" and not the exact phrase. Thanks!
get_all_tweets(query = "goals of care", start_tweets = "2020-08-01T00:00:00Z", end_tweets = "2021-08-10T00:00:00Z", bearer_token=bearer_token, data_path = mypath, n=Inf, bind_tweets = F) I
Expected Behavior
Wanted to get the exact phrase "goals of care" instead got tweets with only "goals" or "care"
Steps To Reproduce
get_all_tweets(query = "goals of care", start_tweets = "2020-08-01T00:00:00Z", end_tweets = "2021-08-10T00:00:00Z", bearer_token=bearer_token, data_path = mypath, n=Inf, bind_tweets = F)
Environment
R version 3.6.1 (2019-07-05) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] tidytext_0.2.3 lubridate_1.7.4 forcats_0.4.0 stringr_1.4.0 dplyr_1.0.6
[6] purrr_0.3.3 tidyr_1.0.2 tibble_2.1.3 ggplot2_3.2.1 tidyverse_1.3.0
[11] ldatuning_1.0.0 topicmodels_0.2-9 quanteda_1.5.2 xlsx_0.6.1 readr_1.3.1
[16] academictwitteR_0.2.1
loaded via a namespace (and not attached): [1] httr_1.4.1 jsonlite_1.6 modelr_0.1.6 RcppParallel_4.4.4 assertthat_0.2.1 stats4_3.6.1
[7] xlsxjars_0.6.1 cellranger_1.1.0 slam_0.1-46 pillar_1.6.2 backports_1.1.5 lattice_0.20-38
[13] glue_1.4.2 RColorBrewer_1.1-2 rvest_0.3.5 colorspace_1.4-1 Matrix_1.2-17 tm_0.7-6
[19] pkgconfig_2.0.3 broom_0.5.5 haven_2.2.0 scales_1.1.0 generics_0.0.2 usethis_2.0.1
[25] ellipsis_0.3.2 withr_2.4.2 lazyeval_0.2.2 NLP_0.2-0 cli_2.5.0 magrittr_1.5
[31] crayon_1.3.4 readxl_1.3.1 tokenizers_0.2.1 janeaustenr_0.1.5 stopwords_1.0 fs_1.3.2
[37] fansi_0.4.0 SnowballC_0.6.0 nlme_3.1-140 xml2_1.2.2 tools_3.6.1 data.table_1.12.6 [43] hms_0.5.2 lifecycle_1.0.0 munsell_0.5.0 reprex_0.3.0 compiler_3.6.1 rlang_0.4.11
[49] grid_3.6.1 rstudioapi_0.11 gtable_0.3.0 curl_4.3 DBI_1.1.0 R6_2.4.1
[55] utf8_1.1.4 fastmatch_1.1-0 cld2_1.2.1 modeltools_0.2-22 rJava_0.9-11 stringi_1.4.3
[61] parallel_3.6.1 Rcpp_1.0.3 vctrs_0.3.8 spacyr_1.2 wordcloud_2.6 dbplyr_1.4.2
[67] tidyselect_1.1.1
Anything else?
No response