HTR-United / htr-united

Ground Truth Resources for the HTR of patrimonial documents
https://htr-united.github.io
Creative Commons Zero v1.0 Universal
36 stars 31 forks source link

Adding dataset Incunabula Reichenau #150

Closed katharinaost closed 2 weeks ago

katharinaost commented 3 weeks ago

Hello HTR-United team!

please consider the following data set description for inclusion in your directory.

As the referenced data set contains pages from ~200 prints, a full description would be impractical. Please let me know if any changes are required.

Thank you for your amazing work!

Here is our dataset YAML file:

schema: https://htr-united.github.io/schema/2023-06-27/schema.json
title: Incunabula Reichenau
url: https://doi.org/10.5281/zenodo.11046061
authors:
  - name: Annika
    surname: Stello
    orcid: 0000-0002-6305-4810
    roles:
      - project-manager
  - name: Gerit
    surname: Heim
    orcid: 0000-0002-5820-7771
    roles:
      - project-manager
  - name: Katharina
    surname: Ost
    orcid: 0000-0002-6234-9721
    roles:
      - transcriber
institutions: []
description: >-
  This data set contains the training data for the following three published
  Transkribus models:

  German Incunabula (Reichenau)
  Latin Incunabula (Reichenau)
  Latin/German Bilingual Incunabula (Reichenau)

  This data set represents an excerpt of a collection of incunabula and post-incunabula
  of the   former Reichenau monastery, now held at the Badische Landesbibliothek in
  Karlsruhe (see https://digital.blb-karlsruhe.de/topic/view/7530707). As, typically,
  1-20 pages were drawn from single prints, it reflects a wide range of typefaces used
  by early printers from the German language area and Northern Italy.

  The data was created as part of the project Digitalisierung und Volltexterkennung
  der ehemals Reichenauer Inkunabeln at the Badische Landesbibliothek, which was
  funded by the Stiftung Kulturgut Baden-Württemberg.
project-name: Digitalisierung und Volltexterkennung der ehemals Reichenauer Inkunabeln
language:
  - lat
  - deu
production-software: Transkribus
automatically-aligned: false
script:
  - iso: Latn
  - iso: Goth
script-type: only-typed
time:
  notBefore: '1470'
  notAfter: '1510'
hands:
  count: more-than-10
  precision: exact
license:
  name: CC-BY-SA 4.0
  url: https://creativecommons.org/licenses/by-sa/4.0/
format: Page-XML
volume:
  - metric: pages
    count: 2200
transcription-guidelines: Abbreviations are represented through special characters, please see the project repository for a full documentation.
alix-tz commented 2 weeks ago

Hello Katharina,

This is really nice, many thanks for your contribution! As you can see, I created the corresponding pull request, so the dataset is now added to the database ! :)