OCR-D / spec

Specification of the @OCR-D technical architecture, interface definitions and data exchange format(s)
https://ocr-d.de/en/spec/
17 stars 5 forks source link

Make OS package dependencies part of ocrd-tool.json #131

Open kba opened 4 years ago

kba commented 4 years ago

If ocrd-tool.json contained a list of system packages required, we'd had a uniform way to express system depencies, the make deps-ubuntu target would be redundant and we could generate documentation from it.

E.g. for ocrd_tesserocr:

 {
   "version": "0.6.0",
   "git_url": "https://github.com/OCR-D/ocrd_tesserocr",
   "dockerhub": "ocrd/tesserocr",
   "os_deps": {
     "ubuntu_18_04": ["git", "python3", "python3-pip", "libtesseract-dev", "libleptonica-dev", "tesseract-ocr-eng", "tesseract-ocr"]
   },
  # [..]

Can be queried, e.g. with jq:

sudo apt-get install -y $(jq -r '.os_deps.ubuntu_18_04[]' ~monorepo/ocrd_tesserocr/ocrd-tool.json)

We could get a list of all system packages by analyzing all the ocrd-tool.json files in our monorepo or stweil/ocrd_all to install required packages in one go.

What do you think?

bertsky commented 4 years ago

Good plan IMO. You have to get every MP on board though (or we need to support both ways).

VolkerHartmann commented 4 years ago

The idea sounds very good. I see some difficulties:

  1. It does not solve the problem with conflicting configurations.
  2. All developers have to take part which means additional effort for them and sometimes there are quite nondescript things missing (e.g.: aclocal at ocrd-olena).
  3. It also depends on the OS. (Maybe some packages are not available in all OS versions available in the correct version.)
  4. If so, you might need a list of alternative repositories depending on the OS.

At the end dependency management becomes very complex. If there is a working dockerfile all dependencies should be available there without having to worry about conflicting configurations.

bertsky commented 4 years ago
1. It does not solve the problem with conflicting configurations.

I don't see that with system dependencies (yet). We did have (and will have) conflicting requirements for Python packages, but this can always be dealt with by encapsulation in virtual environments. If system packages are needed in a specific version in certain OS, these could be specified in a way compatible with apt CLI syntax (e.g. automake==1.15 or automake-1.15, whatever the respective repository offers).

2. All developers have to take part which means additional effort for them

They already have to describe system dependencies somehow. So far, the deps-ubuntu makefile target has been our convention for formalizing these. We need not stop supporting this. The proposal just adds another (more expressive/flexible) way. (We could deprecate the old solution and phase it out when all modules adopted the new one.)

and sometimes there are quite nondescript things missing (e.g.: aclocal at ocrd-olena).

Like packages not available via (standard) OS repos? I agree, but this (potential) problem already existed. I'd say provide a mechanism (download/clone and build from source) in the makefile along with deps.

3. It also depends on the OS.

This is already covered by the proposal. (Adding more OSs can also be done by maintainers later-on.)

(Maybe some packages are not available in all OS versions available in the correct version.)

I agree this might be a problem: some OSs would rely on make deps and some on the tool json's os_deps. So either the makefile would have to check the OS (and version) via autoconf etc, or you pass that information in via an environment variable (make deps OS=ubuntu_18_04).

4. If so, you might need a list of alternative repositories depending on the OS.

Yes, that's another option which could help avoiding installation from source via makefile.

We'd have to extend the above syntax, though: e.g.

{
   "version": "0.6.0",
   "git_url": "https://github.com/OCR-D/ocrd_tesserocr",
   "dockerhub": "ocrd/tesserocr",
   "os_deps": {
     "ubuntu_18_04": {
        "repos": ["ppa:alex-p/tesseract-ocr"],
        "packages": ["git", "python3", "python3-pip", "libtesseract-dev", "libleptonica-dev", "tesseract-ocr-eng", "tesseract-ocr"]
      }
   },
  # [..]