UB-Mannheim / AustrianNewspapers

NewsEye / READ OCR training dataset from Austrian Newspapers (1864–1911)
15 stars 3 forks source link

fix rotundas used for etc., one dozen '2c' remaining to check visually in context #5

Closed wollmers closed 4 years ago

stweil commented 4 years ago

Thanks. The sample shows that we still have to expect lots of u/n confusions.

wollmers commented 4 years ago

I saw this u/n and some others during reviewing the diffs, but want to solve them either by dictionary check or optical/visual comparison. There are also (seldom) typesetting errors in the original image and it's questionable to overrule the typesetter. Ideally obvious typesetting errors should be tagged, like DTA (Deutsches Text Archiv) did it.

stweil commented 4 years ago

I absolutely agree. A dictionary check can only give a hint because of the typesetting errors in the original which must be preserved in the transcription.