Closed GrazingScientist closed 4 years ago
I am afraid your case – ocrd_all via Docker image – is not covered by ocrd.de documentation yet. So that's actually a documentation issue (should be moved to ocrd-website). Or you could say it's an ocrd_all issue, but not core.
The situation in the Docker image is different, because Tesseract has been installed from source there (not apt), so models reside in a custom path (/usr/local/share/tessdata
), but you cannot use the ocrd_all make rules to fetch additional models.
Also, by the way, I had to set
TESSDATA_PREFIX
in the Dockerfile asENV TESSDATA_PREFIX=/usr/share/tesseract-ocr/4.00/tessdata
because otherwise the models are not found
This makes your ocrd_tesserocr use only models you can install via apt. The default path for the Docker version is /usr/local/share/tessdata
(with only the minimal osd
/eng
/equ
installed). It's not exported to the shell, though.
I agree we should find a better solution (either use the apt default path, or at least export TESSDATA_PREFIX
correctly).
Also, note that frk and Fraktur are actually different kinds of fraktur models:
tesseract-ocr-frk
: unlike the apt description, this is not Frankish, but a modern (LSTM-based) Fraktur model for Germantesseract-ocr-script-frak
: pure (without LM/dict) LSTM-based Fraktur model for all languagesYou can even combine them: frk+Fraktur
.
I am afraid your case – ocrd_all via Docker image – is not covered by ocrd.de documentation yet. So that's actually a documentation issue (should be moved to ocrd-website). Or you could say it's an ocrd_all issue, but not core.
Thank you for pointing this out. I have to admit that I posted this in core out of habit. Sorry for that! Is it possible to move the ticket or shall I open a new one in OCR-D website?
I agree we should find a better solution (either use the apt default path, or at least export TESSDATA_PREFIX correctly).
This would be awesome, since I was very confused (but fortunately ran into this situation before already).
Also, note that frk and Fraktur are actually different kinds of fraktur models:
frk / tesseract-ocr-frk: unlike the apt description, this is not Frankish, but a modern (LSTM-based) Fraktur model for German Fraktur / tesseract-ocr-script-frak: pure (without LM/dict) LSTM-based Fraktur model for all languages
You can even combine them: frk+Fraktur.
This differentiation should definitely be covered in the documentation!
Edit: And thanks for all the insights! :)
Used Docker image
docker.io/ocrd/all maximum 7bfeac60c4cb 5 days ago 12.4 GB
Problem Description Using the OCRD Docker image, the call to the tesseract
Fraktur
model given in the documentation here fails.When I install the
Fraktur
model viaapt install tesseract-ocr-script-frak
, the call toocrd-tesserocr-recognize -I OCR-D-SEG-LINE -O OCR-D-OCR-TESS_DEU_FRAK -p '{"model": "deu+frk"}'
as given in the documentation fails.The correct call would be:
Also, by the way, I had to set
TESSDATA_PREFIX
in the Dockerfile asbecause otherwise the models are not found. Can you reproduce this?