Closed jerielizabeth closed 4 months ago
Here is the decision log, which I pulled together from past meeting notes.
For reference, here is the existing preprocessing code in the cleaning.py script on the repo, which is copy and pasted from Wouter's colab notebook.
And here is the updated (new Hathi OCR) text corpus filtered for the test set.
Literature (see group Zotero library):
Code Repositories:
Dependent on info from :
~Outcome - Answers to the following questions:~
~- [ ] do we need to revise any previous decisions about preprocessing~ ~- [ ] do we need to modify / create code for preprocessing~
Actual outcomes