dgrtwo / tidy-text-mining

Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson
http://tidytextmining.com
Other
1.31k stars 803 forks source link

Broken code 9.1 #82

Closed monica-buczynski closed 3 years ago

monica-buczynski commented 3 years ago

After submitting the code as noted in 9.1, I get the error:

error1

"Error: Problem with mutate() input ..2. x Input ..2 must be a vector, not a function. i Input ..2 is id."

I've tried playing around with the code, but haven't been able to figure out how to fix it.

juliasilge commented 3 years ago

I just ran this code again with current version of purrr and tidyr and it works from the project directory. Can you making sure you are working from the project directory?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(purrr)
library(readr)

training_folder <- "data/20news-bydate/20news-bydate-train/"

# Define a function to read all files from a folder into a data frame
read_folder <- function(infolder) {
  tibble(file = dir(infolder, full.names = TRUE)) %>%
    mutate(text = map(file, read_lines)) %>%
    transmute(id = basename(file), text) %>%
    unnest(text)
}

# Use unnest() and map() to apply read_folder to each subfolder
raw_text <- tibble(folder = dir(training_folder, full.names = TRUE)) %>%
  mutate(folder_out = map(folder, read_folder)) %>%
  unnest(cols = c(folder_out)) %>%
  transmute(newsgroup = basename(folder), id, text)

raw_text
#> # A tibble: 511,755 x 3
#>    newsgroup   id    text                                                                
#>    <chr>       <chr> <chr>                                                               
#>  1 alt.atheism 49960 "From: mathew <mathew@mantis.co.uk>"                                
#>  2 alt.atheism 49960 "Subject: Alt.Atheism FAQ: Atheist Resources"                       
#>  3 alt.atheism 49960 "Summary: Books, addresses, music -- anything related to atheism"   
#>  4 alt.atheism 49960 "Keywords: FAQ, atheism, books, music, fiction, addresses, contacts"
#>  5 alt.atheism 49960 "Expires: Thu, 29 Apr 1993 11:57:19 GMT"                            
#>  6 alt.atheism 49960 "Distribution: world"                                               
#>  7 alt.atheism 49960 "Organization: Mantis Consultants, Cambridge. UK."                  
#>  8 alt.atheism 49960 "Supersedes: <19930301143317@mantis.co.uk>"                         
#>  9 alt.atheism 49960 "Lines: 290"                                                        
#> 10 alt.atheism 49960 ""                                                                  
# … with 511,745 more rows

The error you are seeing is from those functions not being able to find anything at the given path. I bet you are working from the document directory instead of the project directory.

monica-buczynski commented 3 years ago

Thanks for your response! I did check the versions of the packages and they are up to date. R is also up to date. I also tried loading tidyverse as an alternative as it contains dplyr, tidyr, purrr and readr. No luck and getting the same error even with copying and pasting directly from the book/your comment.

juliasilge commented 3 years ago

It's not the packages; it's your working directory. This file is assuming that your working directory is at the project level. I suspect that you have your working directory somewhere else, maybe the document? Check out where the files are that you are trying to open.

monica-buczynski commented 3 years ago

Ah okay. Thank you! I understand now the issue! Thanks again!