Closed nicholasjhorton closed 5 months ago
@Casey308 might you be willing to take this on? See #2 for the location of the cleaned text (at present, there's only a stub of a single chapter in https://github.com/STAT325-S24/HistoryAmherstCollege/tree/main/data-raw-dehyphenate
See https://github.com/STAT325-S24/HistoryAmherstCollege/commit/0bc70da6252bc59de023e36f11e4eaeb09a2958f. Since not all of the data has been dephyenated I used the depaginated folder, but this can easily be changed. Right now it is just spitting out a txt file into data, because use_data was giving me issues (see attached pic). Any suggestions on what to do, and general comments on how this looks?
New commit actually makes a tibble: Code here: https://github.com/STAT325-S24/HistoryAmherstCollege/commit/02ea8bd3ec0a00a873f5fa48741810ef3d276200 makes a tibble (hac_tibble_mod). Is this what you were thinking?
This is looking good at first glimpse: can someone volunteer to review more comprehensively?
This has been substantially replaced by the code in https://github.com/STAT325-S24/HistoryAmherstCollege/blob/main/data-raw/data.R.
I'm closing this in favor of #39
This issue will be closed when there is a Quarto file and associated pdf which reads in the cleaned chapter files in
data-raw
and outputs the text as an Rds file (usingusethis::use_data(HistoryAmherstCollege, overwrite = TRUE)
) in thedata
folder.This requires completion and testing of #21 and #3, among others.
It is needed for #1