Closed monica-buczynski closed 4 years ago
I just ran this code again with current version of purrr and tidyr and it works from the project directory. Can you making sure you are working from the project directory?
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
library(purrr)
library(readr)
training_folder <- "data/20news-bydate/20news-bydate-train/"
# Define a function to read all files from a folder into a data frame
read_folder <- function(infolder) {
tibble(file = dir(infolder, full.names = TRUE)) %>%
mutate(text = map(file, read_lines)) %>%
transmute(id = basename(file), text) %>%
unnest(text)
}
# Use unnest() and map() to apply read_folder to each subfolder
raw_text <- tibble(folder = dir(training_folder, full.names = TRUE)) %>%
mutate(folder_out = map(folder, read_folder)) %>%
unnest(cols = c(folder_out)) %>%
transmute(newsgroup = basename(folder), id, text)
raw_text
#> # A tibble: 511,755 x 3
#> newsgroup id text
#> <chr> <chr> <chr>
#> 1 alt.atheism 49960 "From: mathew <mathew@mantis.co.uk>"
#> 2 alt.atheism 49960 "Subject: Alt.Atheism FAQ: Atheist Resources"
#> 3 alt.atheism 49960 "Summary: Books, addresses, music -- anything related to atheism"
#> 4 alt.atheism 49960 "Keywords: FAQ, atheism, books, music, fiction, addresses, contacts"
#> 5 alt.atheism 49960 "Expires: Thu, 29 Apr 1993 11:57:19 GMT"
#> 6 alt.atheism 49960 "Distribution: world"
#> 7 alt.atheism 49960 "Organization: Mantis Consultants, Cambridge. UK."
#> 8 alt.atheism 49960 "Supersedes: <19930301143317@mantis.co.uk>"
#> 9 alt.atheism 49960 "Lines: 290"
#> 10 alt.atheism 49960 ""
# … with 511,745 more rows
The error you are seeing is from those functions not being able to find anything at the given path. I bet you are working from the document directory instead of the project directory.
Thanks for your response! I did check the versions of the packages and they are up to date. R is also up to date. I also tried loading tidyverse as an alternative as it contains dplyr, tidyr, purrr and readr. No luck and getting the same error even with copying and pasting directly from the book/your comment.
It's not the packages; it's your working directory. This file is assuming that your working directory is at the project level. I suspect that you have your working directory somewhere else, maybe the document? Check out where the files are that you are trying to open.
Ah okay. Thank you! I understand now the issue! Thanks again!
After submitting the code as noted in 9.1, I get the error:
"Error: Problem with
mutate()
input..2
. x Input..2
must be a vector, not a function. i Input..2
isid
."I've tried playing around with the code, but haven't been able to figure out how to fix it.