UB-Mannheim / AustrianNewspapers

NewsEye / READ OCR training dataset from Austrian Newspapers (1864–1911)
15 stars 3 forks source link

change 2c. to rotunda, and others found during correction #13

Closed wollmers closed 4 years ago

wollmers commented 4 years ago

There are still many other problems, which are risky to change automatically:

grep -R 'sch'    AustrianNewspapers/gt/ | grep '.txt:'
grep -Ri 'Gasse' AustrianNewspapers/gt/ | grep '.txt:'
grep -R 'st'     AustrianNewspapers/gt/ | grep '.txt:'

They give too many results to solve manually. Automatically it will also change to long-s in Antiqua strings. I need first to improve my tools to be faster. And with good images I could detect the font family.

stweil commented 4 years ago

Thank you.