Austrian Newspapers is a ground truth data set created with Transkribus from Austrian newspapers by the Library Labs of the Austrian National Library (Österreichische Nationalbibliothek). See this publication for details:
Günter Mühlberger, & Günter Hackl. (2019). NewsEye / READ OCR training dataset from Austrian Newspapers (19th C.) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3387369
The original data set was published under the Creative Commons Attribution 4.0 International license.
A revision of the data set was carried out by Mannheim University Library
from November 2022 to April 2023 using Transkribus.
All transcriptions are provided as PAGE XML
in the data
folder.
The original separation of the data set into TrainingSet_ONB_Newseye_GT_M1+
and ValidationSet_ONB_Newseye_GT_M1+
was kept.
The revision includes:
Find more information about the revised dataset in our wiki.
The transcription rules are based on the OCR-D Ground Truth Guidelines Level 2 with some exceptions (see below):
1) Special characters:
2) Additional characters transcribed true to original (contrary to OCR-D Level 2):
This revision is part of the OCR-D project and predominantly funded by the German Research Foundation (DFG).