This dataset is composed of photos of various resolution of 35'623 pages of printed books dating from the 15th to the 18th century. Each page has been attributed by experts from one to five labels corresponding to the font groups used in the text, with two extra-classes for non-textual content and fonts not present in the following list: Antiqua, Bastarda, Fraktur, Gotico Antiqua, Greek, Hebrew, Italic, Rotunda, Schwabacher, and Textura.
This dataset offers an image classification dataset that has potential implications for other downstream tasks such as OCR recognition.
A URL for this dataset
https://zenodo.org/record/3366686
Dataset description
This dataset offers an image classification dataset that has potential implications for other downstream tasks such as OCR recognition.
A related paper Dataset of Pages from Early Printed Books with Multiple Font Groups
Dataset modality
Image
Dataset licence
Creative Commons Attribution Non Commercial Share Alike 4.0 International
Other licence
No response
How can you access this data
As a download from a repository/website
Confirm the dataset has an open licence
Contact details for data custodian
No response