HTR-United / htr-united

Ground Truth Resources for the HTR of patrimonial documents
https://htr-united.github.io
Creative Commons Zero v1.0 Universal
37 stars 31 forks source link

Adding dataset Paris Bible Project #101

Closed parisbible closed 1 year ago

parisbible commented 1 year ago
schema: https://htr-united.github.io/schema/2022-04-15/schema.json
title: Paris Bible Project (PBP)
url: https://github.com/parisbible
authors:
  - name: Estelle
    surname: Guéville
    orcid: 0000-0003-2603-1051
    roles:
      - transcriber
      - aligner
      - project-manager
      - quality-control
  - name: David
    surname: Wrisley
    orcid: 0000-0002-0355-1487
    roles:
      - transcriber
      - aligner
      - project-manager
      - quality-control
  - name: Niccolò Acram
    surname: Cappelletto
    roles:
      - transcriber
      - aligner
      - quality-control
institutions: []
description: >-
  The Paris Bible Project aims to understand the production and diffusion of
  medieval Latin Bibles in Europe. The dataset includes ground truth from Paris
  Bibles produced in the 13th and 14th centuries. We also provide the most
  recent version of our list of Paris Bible manuscripts found in the world along
  with information about them.
project-website: https://parisbible.github.io/
language:
  - lat
production-software: Transkribus
script:
  - iso: Latn
  - iso: Goth
script-type: only-manuscript
time:
  notBefore: '1200'
  notAfter: '1399'
hands:
  count: more-than-10
  precision: estimated
license:
  - name: CC-BY 4.0
    url: https://creativecommons.org/licenses/by/4.0/
format: Alto-XML
volume:
  - metric: lines
    count: 10000
transcription-guidelines: 'See: https://parisbible.github.io/guidelines/'
alix-tz commented 1 year ago

Hello! Thank you for this proposition! Do you only offer TXT files for the ground truth? (I assume the link to the ground truth files is https://github.com/parisbible/ground_truth)

If so, then it would be preferable to export XML ALTO or XML PAGE from Transkribus. Also, any chance you can publish the images corresponding to the ground truth?

Otherwise, if I didn't look at the right repository, could you indicate where to find it?

EstelleGvl commented 1 year ago

Dear Alix, Apologies for the delay. We have updated everything in github, here: https://github.com/parisbible/ground_truth/tree/main/PBP%201.0

You will find TXT, ALTO and XML files as well as the images when they are publicly available.

We will keep updating things as the project evolves. All the best, Estelle

PonteIneptique commented 1 year ago

Thaniks and sorry for the delay, we are kind of drowning lately with @alix-tz but we're back :)

PonteIneptique commented 1 year ago

Added to HTR-United through #108. It will be visible online soon :)

I recomputed some metrics: I only found 1700 lines and not 10000. I updated the scripts (Goth is for gothic script such as https://www.compart.com/en/unicode/scripts/Goth ).

Thanks !