Open bertsky opened 3 years ago
- opening up the repository for comments and ammendments by users/practitioners (perhaps in the same way that the workflow guide was mirrored to the wiki and gets synchronized back every now and then)
It's not as convenient as a Wiki (with direct preview), and not as conventient as editing Markdown files on Github (with direct preview), but perhaps users can just fork/edit the gt-guidelines repo?
- finally starting a software implementation (which can normalize arbitrary text input at each GT level or canonicalize to the next lower level)
Existing places to look for (just off my head):
GT4HistOCR/tools/regularize.pl
(with care!)ping @tboenig once he's back from vacation
In their current form, the OCR-D transcription guidelines are often of little use to annotators looking for answers or guidance. They are written top-down intellectual accounts, but not formal (i.e. runnable/verifiable) and not searchable and – well, quite incomplete. Although many examples are given already, this is not nearly enough for the diverse set of materials and pecularities which annotators face (esp. those without a bibliological / humanities background).
How can we improve that?
I propose attacking this on multiple levels: