STAT325-S24 / HistoryAmherstCollege

Text and analysis related to Williams S. Tyler's "History of Amherst College" (1873)
MIT License
0 stars 1 forks source link

address page breaks #3

Closed nicholasjhorton closed 7 months ago

nicholasjhorton commented 8 months ago

This issue will be closed when page breaks are addressed in the wrangling process. This is needed before #2 since some pages may break with a hyphen!

nicholasjhorton commented 7 months ago

This feels like an important next step. Can it happen in parallel with the footnote processing? Let's talk.

jpapagelis24 commented 7 months ago

We need to figure out what we want to keep when addressing the page breaks. We could perhaps keep the page numbers because they are referenced in other places. Perhaps keep the page number along with the first word or line. Some questions to think of:

tknightly24 commented 7 months ago

We created a table that extracts the page number, page header (the text next to the page number), and the first line of the page. Our next steps are to remove the extra spaces that make up the page break.

nicholasjhorton commented 7 months ago

Thanks for your work on this front. Where can we find the table?

tknightly24 commented 7 months ago

@nicholasjhorton here is the table we were working on last class: d6ff211

jpapagelis24 commented 7 months ago

Created new folders that contain the page tables (page_tables) and the depaginated texts for each chapter (data-raw-depaginate). This issue is complete unless there are any issues with the tables, text.

nicholasjhorton commented 7 months ago

It's great to see how this is coming together: nice work!

If the code isn't already being organized in a Quarto file (with associated pdf) can you please move your code into there so that we can track the workflow (I've already fixed some things that will need to be run through one more time).

Same thing will be needed for the code that generates the page subtitles.

nicholasjhorton commented 7 months ago

Any updates on this front? It would be great to have a pdf which listed the subtitles as a side effect of processing the workflow for this issue. (This would address open issues #11.)

nicholasjhorton commented 7 months ago

And #23!

jpapagelis24 commented 7 months ago

@nicholasjhorton

Completed this with commit: df01cc7e289a5a222c31b2fa2f87ea330834f993

See the pdf with the subtitles here: https://github.com/STAT325-S24/HistoryAmherstCollege/blob/main/data-raw/subtitles.pdf

Closing the issue.

tknightly24 commented 7 months ago

Nick and Justin will work through changing the working directory of the qmd file

nicholasjhorton commented 7 months ago

This is looking good but I'm only seeing subtitles.pdf not subtitles.qmd. Any guidance welcomed.

jpapagelis24 commented 7 months ago

@nicholasjhorton I did most of the things we talked about in this commit: 79bdf042aa012d18addcd9383cbce23e754d2f83

Some notes:

nicholasjhorton commented 7 months ago

Thanks for your work on this front.

Is the appendix change needed (since I renamed it to chapter29.txt)?

nicholasjhorton commented 7 months ago

@jpapagelis24 @tknightly24 might you be willing to update this issue with the current status of this work? Thanks in advance, Nick

jpapagelis24 commented 7 months ago

@nicholasjhorton This is complete. Closing the issue.