Closed cpjfb closed 7 years ago
Hmmmm, I'm not able to reproduce this problem. This the code you mean, right?
library(lubridate)
library(ggplot2)
library(dplyr)
library(readr)
tweets_julia <- read_csv("data/tweets_julia.csv")
#> Parsed with column specification:
#> cols(
#> tweet_id = col_double(),
#> in_reply_to_status_id = col_double(),
#> in_reply_to_user_id = col_double(),
#> timestamp = col_character(),
#> source = col_character(),
#> text = col_character(),
#> retweeted_status_id = col_double(),
#> retweeted_status_user_id = col_double(),
#> retweeted_status_timestamp = col_character(),
#> expanded_urls = col_character()
#> )
tweets_dave <- read_csv("data/tweets_julia.csv")
#> Parsed with column specification:
#> cols(
#> tweet_id = col_double(),
#> in_reply_to_status_id = col_double(),
#> in_reply_to_user_id = col_double(),
#> timestamp = col_character(),
#> source = col_character(),
#> text = col_character(),
#> retweeted_status_id = col_double(),
#> retweeted_status_user_id = col_double(),
#> retweeted_status_timestamp = col_character(),
#> expanded_urls = col_character()
#> )
tweets <- bind_rows(tweets_julia %>%
mutate(person = "Julia"),
tweets_dave %>%
mutate(person = "David")) %>%
mutate(timestamp = ymd_hms(timestamp))
library(tidytext)
library(stringr)
replace_reg <- "https://t.co/[A-Za-z\\d]+|http://[A-Za-z\\d]+|&|<|>|RT|https"
unnest_reg <- "([^A-Za-z_\\d#@']|'(?![A-Za-z_\\d#@]))"
tidy_tweets <- tweets %>%
filter(!str_detect(text, "^RT")) %>%
mutate(text = str_replace_all(text, replace_reg, "")) %>%
unnest_tokens(word, text, token = "regex", pattern = unnest_reg) %>%
filter(!word %in% stop_words$word,
str_detect(word, "[a-z]"))
tidy_tweets
#> # A tibble: 149,144 x 11
#> tweet_… in_re… in_r… timestamp source retw… retw… retw… expa…
#> <dbl> <dbl> <dbl> <dttm> <chr> <dbl> <dbl> <chr> <chr>
#> 1 6.78e⁸ NA NA 2008-02-05 00:00:00 "<a h… NA NA <NA> <NA>
#> 2 6.78e⁸ NA NA 2008-02-05 00:00:00 "<a h… NA NA <NA> <NA>
#> 3 6.78e⁸ NA NA 2008-02-05 00:00:00 "<a h… NA NA <NA> <NA>
#> 4 6.78e⁸ NA NA 2008-02-05 00:00:00 "<a h… NA NA <NA> <NA>
#> 5 6.78e⁸ NA NA 2008-02-05 00:00:00 "<a h… NA NA <NA> <NA>
#> 6 6.78e⁸ NA NA 2008-02-05 00:00:00 "<a h… NA NA <NA> <NA>
#> 7 6.78e⁸ NA NA 2008-02-05 00:00:00 "<a h… NA NA <NA> <NA>
#> 8 6.78e⁸ NA NA 2008-02-05 00:00:00 "<a h… NA NA <NA> <NA>
#> 9 6.78e⁸ NA NA 2008-02-05 00:00:00 "<a h… NA NA <NA> <NA>
#> 10 6.78e⁸ NA NA 2008-02-05 00:00:00 "<a h… NA NA <NA> <NA>
#> # ... with 149,134 more rows, and 2 more variables: person <chr>, word
#> # <chr>
I would recommend starting by updating your package versions? Not sure what the problem might be.
Oops, yes, that was the issue, it works like a charm now that I've updated all my packages.
Thanks a lot!
Hi
First, congratulations, I'm loving this book! Great work.
Now, when I run the chunk that will unnest_tokens with the regex on tweets to get tidy_tweets (in 07-tweet-archives.Rmd ), I have an "Error: invalid argument type".
Any idea why?
Thanks!