Semantic-labelling OCR ground truth data and store these data with METS metadata set.
Add the namespace http://www.ocr-d.de/GT/
. We recommend gt
as namespace prefix:
xmlns:gt="http://www.ocr-d.de/GT/"
Set XSD schema location OCR-D_GT_schema.xsd
:
xsi:schemaLocation="file:///OCR-D_GT_schema.xsd" or URL...
See mets_example.xml
.
The ontology is defined in
DefaultLabelTypes_3.xml
taken from
https://github.com/PRImA-Research-Lab/semantic-labelling
The XSD is generated by transforming that ontology with an XSLT stylesheet.
java -jar ../saxon9he.jar -xsl:OCR-D_GT_labelschema_maker.xsl -s:DefaultLabelTypes_3.xml
Ontology described in
Clausner, C and Antonacopoulos: Ontology and framework for semantic labelling of document data and software methods in: 13th IAPR International Workshop on Document Analysis Systems (DAS2018), 24-27 April 2018, Vienna, Austria. http://usir.salford.ac.uk/46896/
Implemented as a set of Java tools in https://github.com/PRImA-Research-Lab/semantic-labelling