OCR-D / ocrd_tesserocr

Run tesseract with the tesserocr bindings with @OCR-D's interfaces
MIT License
39 stars 11 forks source link

new locale assertions in Tesseract are incompatible with Click #23

Closed bertsky closed 6 years ago

bertsky commented 6 years ago

Ever since Tesseract 4 had to introduce an assertion that localization be plain POSIX (C) to ensure certain legacy assumptions in its code are always met, we have to override the current locale before initializing tesserocr API, too. This cannot be done by the user before calling any ocrd_tesserocr CLI, because we depend on the Click library, which itself is incompatible (in Python 3) with that locale (it requires at least C.UTF-8). So we have a deadlock.

We could perhaps reset the locale after click and before tesserocr though.

kba commented 6 years ago

So the tests work because they use the ocrd_tesserocr API directly, right? I ran into that encoding issue with different CI platforms, yes we need to override LC_* after loading click. We should add tests that use the CLI as well. I'll look into it.

kba commented 6 years ago

Oh you fixed it already. Self-fixing issues are the best issues :+1:

bertsky commented 6 years ago

No, the tests (presumably) still work, because they use an older Tesseract version without the assertion in place. True, we should add a CLI test (as in core/test/test_cli.py)...

kba commented 6 years ago

No, the tests (presumably) still work, because they use an older Tesseract version without the assertion in place.

They did fail before but we set the locale to C before running the tests

https://github.com/OCR-D/ocrd_tesserocr/blob/master/.travis.yml#L32

bertsky commented 6 years ago

Oh, I see. I was distracted by my (non-container) test results.