Decentralize the resource list

kba commented 2 years ago

As a OCR-D user I want to organize ( my resources easily with the resource manager command line tool easily.

Current situation

We keep a list of all known resources in OCR-D/core to allow users to easily download models, configurations and other (mostly binary) data with the ocrd resmgr tool.

To contribute new resources or update existing resources when a new version is published, users have to make OCR-D/core developers aware of them and it requires a new release of OCR-D/core to be installed, in all sub-venvs too, to use those new resources

How it should be

The least surprising place to find resources for a processor is the ocrd-tool.json of that processor. Besides describing the processors of a project, it should also describe all the known resources and be maintained by the same developers that maintain the software.

There are two ways conceivable for producing the resource database for ocrd resmgr:

At build time, e.g. in ocrd_all, combine all the resource information from all the processors into a static resource_list.yml.
Dynamically generate the resource_list.yml at runtime by $PATH introspection

Testing

It should be possible to add a new resource to the database by updating the ocrd-tool.json of a processor, without an OCR-D/core release.

bertsky commented 2 years ago

There are two ways conceivable for producing the resource database for ocrd resmgr:

At build time, e.g. in ocrd_all, combine all the resource information from all the processors into a static resource_list.yml.

Dynamically generate the resource_list.yml at runtime by ~~$PATH introspection~~ ocrd-tool.json parsing for the current tool (and mixing with the preregistered/central resources)

I am much in favour of that second option. (We already look into the tool JSON anyway, and let's not rely too much on how exactly deployment happened; think bashlib processors.)

kba commented 2 years ago

Released in https://github.com/OCR-D/core/releases/tag/v2.38.0

OCR-D / zenhub