STAT325-S24 / HistoryAmherstCollege

Text and analysis related to Williams S. Tyler's "History of Amherst College" (1873)
MIT License
0 stars 1 forks source link

generate cleaned data #24

Closed nicholasjhorton closed 5 months ago

nicholasjhorton commented 5 months ago

This issue will be closed when there is a Quarto file and associated pdf which reads in the cleaned chapter files in data-raw and outputs the text as an Rds file (using usethis::use_data(HistoryAmherstCollege, overwrite = TRUE)) in the data folder.

This requires completion and testing of #21 and #3, among others.

It is needed for #1

nicholasjhorton commented 5 months ago

@Casey308 might you be willing to take this on? See #2 for the location of the cleaned text (at present, there's only a stub of a single chapter in https://github.com/STAT325-S24/HistoryAmherstCollege/tree/main/data-raw-dehyphenate

Casey308 commented 5 months ago

See https://github.com/STAT325-S24/HistoryAmherstCollege/commit/0bc70da6252bc59de023e36f11e4eaeb09a2958f. Since not all of the data has been dephyenated I used the depaginated folder, but this can easily be changed. Right now it is just spitting out a txt file into data, because use_data was giving me issues (see attached pic). Any suggestions on what to do, and general comments on how this looks?

image

Casey308 commented 5 months ago

PDF here: https://github.com/STAT325-S24/HistoryAmherstCollege/blob/main/read_data.pdf

Casey308 commented 5 months ago

New commit actually makes a tibble: Code here: https://github.com/STAT325-S24/HistoryAmherstCollege/commit/02ea8bd3ec0a00a873f5fa48741810ef3d276200 makes a tibble (hac_tibble_mod). Is this what you were thinking?

nicholasjhorton commented 5 months ago

This is looking good at first glimpse: can someone volunteer to review more comprehensively?

nicholasjhorton commented 5 months ago

This has been substantially replaced by the code in https://github.com/STAT325-S24/HistoryAmherstCollege/blob/main/data-raw/data.R.

I'm closing this in favor of #39