Open kba opened 4 years ago
Q&D ocrd AppImage to be built with pkg2appimage
:
# Based on https://github.com/AppImage/pkg2appimage/blob/9249a99e653272416c8ee8f42cecdde12573ba3e/recipes/ProcDump.yml
app: ocrd
ingredients:
dist: bionic
sources:
- deb http://us.archive.ubuntu.com/ubuntu/ bionic bionic-updates bionic-security main universe
- deb http://us.archive.ubuntu.com/ubuntu/ bionic-updates main universe
- deb http://us.archive.ubuntu.com/ubuntu/ bionic-security main universe
packages:
- python3.6-venv
script:
script:
- virtualenv --python=python3 usr
- ./usr/bin/pip3 install ocrd
- ./usr/bin/pip3 freeze | grep "^ocrd==" | cut -d "=" -f 3 > ../VERSION
# XXX at least pkg2appimage needs a desktop file and an icon, might want to use something
# else to build, but this is a POC, so...
- mkdir -p usr/share/applications/
- cat > usr/share/applications/ocrd.desktop <<\EOF
- [Desktop Entry]
- Name=ocrd
- Exec=ocrd
- Icon=ocrd
- Comment=OCR-D core
- Categories=Office;
- Type=Application
- Terminal=true
- EOF
- touch usr/share/icons/hicolor/512x512/apps/ocrd.png # FIXME
- cp usr/share/icons/hicolor/512x512/apps/ocrd.png .
- cp usr/share/applications/ocrd.desktop .
This has some quirks like .desktop and the icon and the handling of the working directory, but it was pleasingly easy to build this:
% ~/devel/app-image-ocrd/out/ocrd-2.12.2.glibc2.3.3-x86_64.AppImage workspace -d /tmp/actevedef_718448162 get-id
http://resolver.staatsbibliothek-berlin.de/SBB00008F1000000000
(ugly bagit.py error message removed)
My opinion(!) on this:
If OCR-D has everything either
then - with a little experience - it is easy to build and maintain dependency-isolated AppImages or Docker containers. I would aim for this situation.
This way it's possible to:
Packaging everything into classical Ubuntu packages will produce the same Gordian knot of dependency problems as the original ocrd_all concept. (I call it Gordian knot because I am currently upgrading ocrd_calamari to TF2 and now need TF2.3 to solve some issues → I am sure some other processor will have issues with that.)
(There are some quirks with AppImage we should have a look at, but it looks really good.)
(My fat container approach https://travis-ci.org/github/mikegerber/my_ocrd_workflow has the same Gordian knot, I just include fewer processors.)
And you can then still stick an AppImage into a Ubuntu package. It's a bit perverse but easy to do.
(Needs a bit more work if you have e.g. a classical ocrd_olena package and then another one that includes everything as an AppImage.)
Now that a solution to the conflicting dependency problem is imminent, we should discuss how we can reduce build times and simplify management of OCR models by supporting OS package management.
I see three areas where package management can improve ocrd_all:
Ad 1.: The only way this can work without creating system-wide dependency conflicts would be basically a repackaging of the
maximum
docker image. This is also of interest and AppImage is probably a good solutionAd 2.: Since the scope is limited (tesseract and olena), @mikegerber has already built debian/ubuntu packages for olena and @AlexanderP builds tesseract for Launchpad's PPA, this would be relatively straightforward
Ad 3.: For tesseract models we can take the official
tesseract-ocr-*
models as a blueprint. ocropy and kraken models can also be packaged relatively easy. For calamari models, we should probably agree on a convention where and how models should be stored (ping @maxnth @andbue @chreul if you have already ideas/plans in that regard)The model packaging in particular would be of benefit also outside the OCR-D "ecosphere".
My questions for the ocrd_all users/developers:
Feedback and pointers to solutions are very welcome.