Open Jacobzwj opened 2 years ago
@Qiegao1994 Thanks for raising this. And the tidy
format is opinionated and thus it doesn't include the reply_count
by default. You may see the relevant code here.
If you need that data (now), I recommend converting the data first to 'raw' and then join it back to the tidy dataframe.
require(academictwitteR)
#> Loading required package: academictwitteR
require(tidyverse)
#> Loading required package: tidyverse
temp_path <- academictwitteR:::.gen_random_dir()
get_all_tweets("data @twitterdev", start_tweets = "2020-12-01T00:00:00Z", end_tweets = "2021-01-01T00:00:00Z", is_retweet = FALSE, data_path = temp_path, bind_tweets = FALSE, verbose = FALSE)
x <- bind_tweets(temp_path, output_format = "tidy")
y <- bind_tweets(temp_path, output_format = "raw")
y$tweet.public_metrics.reply_count
#> # A tibble: 7 × 2
#> tweet_id data
#> <chr> <int>
#> 1 1339351148050796544 0
#> 2 1339037915280654337 0
#> 3 1338309611946991621 0
#> 4 1338221705358143489 0
#> 5 1336432113244114944 0
#> 6 1334620555668963328 0
#> 7 1334596974318903297 1
y$tweet.public_metrics.reply_count %>% rename(reply_count = "data") %>% left_join(x, by = "tweet_id")
#> # A tibble: 7 × 32
#> tweet_id reply_count user_username text created_at conversation_id source
#> <chr> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 13393511480… 0 shreerangp "@Tw… 2020-12-1… 13393502089421… Twitt…
#> 2 13390379152… 0 SawavehVezhh… "@ge… 2020-12-1… 13390379152806… Twitt…
#> 3 13383096119… 0 graylanj "@gg… 2020-12-1… 13381473230970… Twitt…
#> 4 13382217053… 0 chRSBGREEN "Tha… 2020-12-1… 13380293740298… Twitt…
#> 5 13364321132… 0 Tuhung16 "@Th… 2020-12-0… 13338020140613… Twitt…
#> 6 13346205556… 0 Brodie9992 "<… 2020-12-0… 13346205556689… Twitt…
#> 7 13345969743… 1 jamie_maguir… "@Tw… 2020-12-0… 13345644888848… Twitt…
#> # … with 25 more variables: lang <chr>, in_reply_to_user_id <chr>,
#> # possibly_sensitive <lgl>, author_id <chr>, user_name <chr>,
#> # user_verified <lgl>, user_profile_image_url <chr>, user_description <chr>,
#> # user_url <chr>, user_location <chr>, user_created_at <chr>,
#> # user_protected <lgl>, user_pinned_tweet_id <chr>, retweet_count <int>,
#> # like_count <int>, quote_count <int>, user_tweet_count <int>,
#> # user_list_count <int>, user_followers_count <int>, …
Created on 2022-02-22 by the reprex package (v2.0.1)
@Qiegao1994 Thanks for raising this. And the
tidy
format is opinionated and thus it doesn't include thereply_count
by default. You may see the relevant code here.If you need that data (now), I recommend converting the data first to 'raw' and then join it back to the tidy dataframe.
require(academictwitteR) #> Loading required package: academictwitteR require(tidyverse) #> Loading required package: tidyverse temp_path <- academictwitteR:::.gen_random_dir() get_all_tweets("data @twitterdev", start_tweets = "2020-12-01T00:00:00Z", end_tweets = "2021-01-01T00:00:00Z", is_retweet = FALSE, data_path = temp_path, bind_tweets = FALSE, verbose = FALSE) x <- bind_tweets(temp_path, output_format = "tidy") y <- bind_tweets(temp_path, output_format = "raw") y$tweet.public_metrics.reply_count #> # A tibble: 7 × 2 #> tweet_id data #> <chr> <int> #> 1 1339351148050796544 0 #> 2 1339037915280654337 0 #> 3 1338309611946991621 0 #> 4 1338221705358143489 0 #> 5 1336432113244114944 0 #> 6 1334620555668963328 0 #> 7 1334596974318903297 1 y$tweet.public_metrics.reply_count %>% rename(reply_count = "data") %>% left_join(x, by = "tweet_id") #> # A tibble: 7 × 32 #> tweet_id reply_count user_username text created_at conversation_id source #> <chr> <int> <chr> <chr> <chr> <chr> <chr> #> 1 13393511480… 0 shreerangp "@Tw… 2020-12-1… 13393502089421… Twitt… #> 2 13390379152… 0 SawavehVezhh… "@ge… 2020-12-1… 13390379152806… Twitt… #> 3 13383096119… 0 graylanj "@gg… 2020-12-1… 13381473230970… Twitt… #> 4 13382217053… 0 chRSBGREEN "Tha… 2020-12-1… 13380293740298… Twitt… #> 5 13364321132… 0 Tuhung16 "@Th… 2020-12-0… 13338020140613… Twitt… #> 6 13346205556… 0 Brodie9992 "<… 2020-12-0… 13346205556689… Twitt… #> 7 13345969743… 1 jamie_maguir… "@Tw… 2020-12-0… 13345644888848… Twitt… #> # … with 25 more variables: lang <chr>, in_reply_to_user_id <chr>, #> # possibly_sensitive <lgl>, author_id <chr>, user_name <chr>, #> # user_verified <lgl>, user_profile_image_url <chr>, user_description <chr>, #> # user_url <chr>, user_location <chr>, user_created_at <chr>, #> # user_protected <lgl>, user_pinned_tweet_id <chr>, retweet_count <int>, #> # like_count <int>, quote_count <int>, user_tweet_count <int>, #> # user_list_count <int>, user_followers_count <int>, …
Created on 2022-02-22 by the reprex package (v2.0.1)
Thanks for the help! I finally got the “reply_count” in a similar way as you suggested! Really thanks!
Then, as a user, I would appreciate it if "reply_count" could be a default column in tidy format in the future version of this package. The solution you provided here is useful, but it would be time-consuming if we are handling "big data". For example, I have around 500 files to bind_tweets(). I spent double time on bind_tweets() twice through "tidy" and "raw" respectively (and merging them). And, the column in the tidy version now is enough for sociology research if the reply_count could be added~
Thanks again for your prompt reply and help!
Best, Jacob
Describe the solution you'd like
Thanks for this wonderful package!
I have a question about the tidy format output:
We could get tidy format by bind_tweets(data_path = , output_format = "tidy"), but the result only contains "retweet_count", "like_count" and "quote_count" (see the following picture). I cannot find the "reply_count", which is also a very useful column for researchers. Thus, I raised this issue. I wonder is there a way to find "reply_count" in the tidy output_format?
Thanks again, Jacob
Anything else?