Closed bertsky closed 1 year ago
How do you mean that it does not work? It does list the resources in XDG_DATA_HOME/ocrd-resources/ocrd-tesserocr-recognize
for ocrd-tesserocr-recognize -L
. You mean that it does not correctly handle TESSDATA_PREFIX
?
Yes, it does not respect TESSDATA_PREFIX
and still shows files under /usr/local/share/ocrd-resources
and CWD, which will in fact not be available. Delegating to .config.get_tessdata_path
would fix that. (Probably applies to --show-resource
, too.)
With https://github.com/OCR-D/spec/pull/181 merged and implemented in core, the restriction on location can be expressed as
tools:
ocrd-tesserocr-recognize:
resource_locations: ['data']
list_all_resources
can then be extended to take a list of locations to look in from the ocrd-tool.json and only list those.
We'll still need custom code in here to handle TESSDATA_PREFIX
though so I am not sure whether it's worth it since ocrd_tesserocr is the only processor which would have a differing resource_locations
:/
After we resmgrized ocrd_tesserocr in #166, running any of the CLIs with
-L|--list-resources
is supposed to show the exact list of models available. However, since we cannot and did not adopt the scheme with multiple resource locations, but instead use only a single directory (OcrdResourceManager's default, which isXDG_DATA_HOME/ocrd-resources/EXECUTABLE
) and allow overriding it via shell variable for compatibility reasons (TESSDATA_PREFIX
), the default implementation inocrd_utils.list_all_resources
does not work here.Thus, we should extend the constructor of all ocrd_tesserocr's processors to deal with
list_resources=True
in its own way (by using.config.get_tessdata_path()
).