PAGE2TEI was created and is maintained by Dario Kampkaspar and is licensed under the MIT license.
Apply page2tei-0.xsl to the METS File:
java -jar saxon9he.jar -xsl:page2tei-0.xsl -s:mets.xml -o:[your tei file].xml
Additional stylesheets can be applied to the output created by the basic transformation:
combine-continued.xsl
(or set parameter combine=true()
) — try to combine entities that are split over a line break into one elementsimplify-coordinates.xsl
(parameter bounding-rectangles=true()
by default) — convert polygons into bounding rectanglestokenize.xsl
(or set parameter tokenize=true()
) — perform (very basic!) whitespace tokenizationYou can set the following parameters when calling page2tei-0.xsl
(via command line or via an oXygen scenario; in oXygen, the parameters should be marked as “XPath“):
true()
): create rs type="..."
for person/place/org (default) or persName
etc.false()
): Whether to run white space tokenizationfalse()
): Whether to combine entities over line breaksfalse()
): If false(), region types that correspond to valid TEI elements will be returned as
this element; types that do not correspond to a TEI element will be returned as
tei:ab[@type]. If set to true(), all region types (except for paragraph, heading) will be
returned as tei:ab.false()
): If true(), export the (estimated) word coordinates to the facsimile section.true()
): Whether to create bounding rectangles from polygons (default: true())false()
): Whether to export lines without baseline or notfalse()
): Whether to export regions without text linesSome contributions to this software were created within the scope of a project funded by the German BMBF, project ID 16TOA015A.