Closed joschrew closed 7 months ago
Once merged to your default branch, Codecov will compare your coverage reports and display the results in this comment.
Thanks for integrating Codecov - We've got you covered :open_umbrella:
I wonder whether there are still reasons for building the tesseract
binary.
Using the package from a recent Linux distribution is simpler and would save significant build time.
Another possible approach would also work for tesserocr
and some more parts of OCR-D: OCR-D could use its own package repositories for all parts with simple dependencies.
I wonder whether there are still reasons for building the
tesseract
binary.Using the package from a recent Linux distribution is simpler and would save significant build time.
Because most of the time, we cannot use Tesseract from a Linux distribution: our base distro is usually older than the current one, and we have no control over Tesseract features that we actually need. The same goes for PPA.
We had good reasons to pin to a specific Tesseract version via source build in subrepo. No reason to give that up now.
Another possible approach would also work for
tesserocr
and some more parts of OCR-D: OCR-D could use its own package repositories for all parts with simple dependencies.
Much simpler: conda
@kba: Your changes resolved all my erros with my test workspace. I added a resmgr call to the dockerimage to add eng traineddata. I get an error when trying to process without it.
Edit: Maybe equ.traineddata and osd.traineddata should be added as well, I am not sure
Adapting CircleCI config should follow.
In fact, since it already seems broken on master – unfortunately CircleCI does not keep the logs long enough, but I guess it's about the TESSDATA_PREFIX / resmgr location – we should fix this here.
So I suggest (after rewriting deps-ubuntu
as proposed above) to update the CircleCI config to do make install-tesseract install-tesserocr
before make install
.
In fact, since it already seems broken on master – unfortunately CircleCI does not keep the logs long enough, but I guess it's about the TESSDATA_PREFIX / resmgr location – we should fix this here.
So I suggest (after rewriting
deps-ubuntu
as proposed above) to update the CircleCI config to domake install-tesseract install-tesserocr
beforemake install
.
Now the CI config definitely needs make install-tesseract install-tesserocr
. Also, we must drop the chmod
workaround (for which there is no need anymore).
Now the CI config definitely needs
make install-tesseract install-tesserocr
. Also, we must drop thechmod
workaround (for which there is no need anymore).
@joschrew do you want me to make that change (on your fork's writable branch)?
Oh, maybe we should also migrate make install tesseract-training
here? (Once we remove these rules from ocrd_all, there would be no more way to compile lstmtraining
, combine_tessdata
etc.)
This PR is part of series to offer single ocrd modules as Docker Containers (ocrd slim containers) to be used with ocr-d network.
This Dockerfile currently doesn't work in all cases and it still needs updates. I created the PR anyway because I use/need it for my tests.EDIT now works. (This basically migrates all theinstall-tesseract
rules from ocrd_all's makefile here, where it actually belongs.)My idea was to maybe create the tesseract Container with ocrd_all: