After #166 the documentation regarding model requirements is not correct anymore (if it was before). Because the std path has changed, the installation steps have changed as well. Also, we should point out more clearly that there will be no segmentation without eng.traineddata and no deskewing without osd.traineddata. Unfortunately, the corresponding error message is not helpful at all:
File "/build/ocrd_tesserocr/ocrd_tesserocr/deskew.py", line 68, in process
psm=PSM.AUTO_OSD
File "tesserocr.pyx", line 1210, in tesserocr.PyTessBaseAPI.__cinit__
self._init_api(cpath, clang, oem, NULL, 0, NULL, NULL, False, psm)
File "tesserocr.pyx", line 1223, in tesserocr.PyTessBaseAPI._init_api
raise RuntimeError('Failed to init API, possibly an invalid tessdata path: {}'.format(path))
But that probably needs to be fixed somewhere between Tesseract and tesserocr.
After #166 the documentation regarding model requirements is not correct anymore (if it was before). Because the std path has changed, the installation steps have changed as well. Also, we should point out more clearly that there will be no segmentation without
eng.traineddata
and no deskewing withoutosd.traineddata
. Unfortunately, the corresponding error message is not helpful at all:But that probably needs to be fixed somewhere between Tesseract and tesserocr.