Open wollmers opened 4 years ago
I am afraid that most images with ###
are unusable for training, because ###
was obviously used for transcription of unreadable text. A small number of them could get a transcription from an experienced reader.
Fixed ~150 of them as far as I can guess them, i. e. good language processing with dictionary lookup and word n-grams can solve it. Some result from poor knowledge of the transcribers in old Viennese vocabulary, e. g. Kloth (a special cotton fabric) or old geonames in the Austrian monarchy. If numbers or names of persons are unreadable there is no chance without additional context.
Hopefully I haven't overdone the fixes.
The line images in
gt/train
sometimes are too short, e. g.Just a reminder to explore this issue later.