HTR-United / cremma-wikipedia

A collection of ground truth to train HTR models on contemporary French handwritings
Creative Commons Attribution 4.0 International
0 stars 0 forks source link
cremmawiki french ground-truth htr wikipedia work-in-progress

CREMMA - Wikipedia

files badge regions badge lines badge characters badge


CC BY 4.0

DOI

Description

The CREMMA WIKIPEDIA project aims at creating a collection of ground truth to train HTR models on contemporary French handwriting.

Each image represents an exerpt from a randomly selected Wikipedia page, copied by hand by volunteers. We then took care of the alignment between the handwritten portion and the original text, also present on the image.

Transcription guidelines

The transcription guidelines follow CREMMA's convention for modern documents. In short:

The text to copy may have included phonetic transcription. Non-french letters and diacritics were rendered as well. See characters.csv for the list of the characters used in this dataset. The character set can be normalized using ChocoMufin

Related tools

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0