fraktur_19th_century vs github.com/qurator-spk/train-calamari-gt4histocr

it is mainly based on gt4histocr GT but also on Fraktur19 data from other freely available sources (archiscribe and jze). the following training pipeline was applied: each voter used a different out-of-domain mixed model as a starting point (trained on various subsets of gt4histocr). then, after training on all available Fraktur19 data using data augmentation, a final refinement step was performed, limiting the number of lines per source to a maximum of 50. a padding of 3 rows of white pixels was added to the top and bottom of each line, if not already present. the effect on lines segmented without this padding has not been thoroughly evaluated, yet. maybe training a new/additional ensemble using no/mixed padding would be sensible. feedback would be dearly appreciated.

Calamari-OCR / calamari_models

fraktur_19th_century vs github.com/qurator-spk/train-calamari-gt4histocr #3