Open joewiz opened 4 years ago
As developed in recent commits, we now have better - hopefully unambiguous - guidelines for pb and lb elements:
<pb>
(page beginning) and <lb>
(line beginning) elements. To ensure high fidelity between scanned pages and digital text and maximize TEI XML consistency and legibility, these elements should be arranged as follows:@n
attribute (use square brackets if implicit or empty square brackets if the page number is not part of a page stream), the 4-digit padded page scan sequence in the @facs
attribute, and the @xml:id
as pg_
plus the page number (or pg-seq-____
, replacing the underscores with the value of the @facs
attribute). (Note: @break=yes|no
is under consideration.)@break=yes|no
is under consideration.)To do:
@break="no"
attributes into pb/lb elements when we are positive they split a single word, and then we could insert whitespace around the others. Further analysis is needed, but the goal is to avoid manual review if possible, since there are hundreds of thousands of these elements in the corpus.As we work with these new schema and transformation facilities, please share comments or concerns, so we can refine them and put the resulting guidance into frus.odd.
break=yes|no
attribute (see https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.breaking.html)