femifrak / test

test
0 stars 0 forks source link

which traineddata to use for german fraktur #1

Open femifrak opened 6 years ago

femifrak commented 6 years ago

@stweil There are several traineddata which can be used to ocr german fraktur texts. I found these:

https://github.com/Shreeshrii/tessdata_fraktur/ frk.traineddata https://github.com/Shreeshrii/tessdata_fraktur/ frk-plus-Fraktur-52500.traineddata https://github.com/tesseract-ocr/tessdata/ deu_frak.traineddata https://github.com/tesseract-ocr/tessdata/ frk.traineddata https://github.com/tesseract-ocr/tessdata_fas/ frk.traineddata https://github.com/tesseract-ocr/tessdata_best/ frk.traineddata

However, I don't know how to find out the best suited. Is this just done be trial and error? What about using frk+deu? I am 97% happy with the first one. But maybe there are even better possibilities (e.g. other wordlists, ...)

I am asking this because I know that you are well experienced in ocr'ing german fraktur.

stweil commented 6 years ago

In addition there is script/Fraktur.traineddata (available for tessdata, tessdata_best and tessdata_fast) which supports more symbols.