OCR-D / core

Collection of OCR-related python tools and wrappers from @OCR-D
https://ocr-d.de/core/
Apache License 2.0
119 stars 31 forks source link

`ocrd resmgr download '*'` weird behavior #1044

Open kba opened 1 year ago

kba commented 1 year ago

When running ocrd resmgr download '*' in latest ocrd_all Docker image only some models are installed:

find / |grep ocrd-resources
/usr/local/share/ocrd-resources/ocrd-cis-ocropy-recognize /usr/local/share/ocrd-resources/ocrd-cis-ocropy-recognize/en-default.pyrnn.gz /usr/local/share/ocrd-resources/ocrd-cis-ocropy-recognize/LatinHist.pyrnn.gz /usr/local/share/ocrd-resources/ocrd-cis-ocropy-recognize/fraktur.pyrnn.gz /usr/local/share/ocrd-resources/ocrd-cis-ocropy-recognize/fraktur-jze.pyrnn.gz /usr/local/share/ocrd-resources/ocrd-kraken-segment /usr/local/share/ocrd-resources/ocrd-kraken-segment/blla.mlmodel /usr/local/share/ocrd-resources/ocrd-calamari-recognize /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/2.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/1.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/4.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/4.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/1.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/2.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/0.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/3.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/0.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/3.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/2.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/1.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/4.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/4.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/1.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/2.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/0.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/3.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/0.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/3.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/2.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/1.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/4.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/4.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/1.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/2.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/0.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/3.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/0.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/3.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-sbb-textline-detector /usr/local/share/ocrd-resources/ocrd-sbb-textline-detector/default /usr/local/share/ocrd-resources/ocrd-sbb-textline-detector/default/model_strukturerkennung.h5 /usr/local/share/ocrd-resources/ocrd-sbb-textline-detector/default/model_textline_new.h5 /usr/local/share/ocrd-resources/ocrd-sbb-textline-detector/default/model_page_mixed_best.h5 /usr/local/share/ocrd-resources/ocrd-kraken-recognize /usr/local/share/ocrd-resources/ocrd-kraken-recognize/en_best.mlmodel /usr/local/share/ocrd-resources/ocrd-sbb-binarize /usr/local/share/ocrd-resources/ocrd-sbb-binarize/default-2021-03-09 /usr/local/share/ocrd-resources/ocrd-sbb-binarize/default-2021-03-09/model_bin_sbb_ens.h5 /usr/local/share/ocrd-resources/ocrd-sbb-binarize/default /usr/local/share/ocrd-resources/ocrd-sbb-binarize/default/model_bin3.h5 /usr/local/share/ocrd-resources/ocrd-sbb-binarize/default/model_bin4.h5 /usr/local/share/ocrd-resources/ocrd-sbb-binarize/default/model_bin1.h5 /usr/local/share/ocrd-resources/ocrd-sbb-binarize/default/model_bin2.h5

E.g. ocrd-tesserocr-recognize models missing entirely. ocrd resmgr download ocrd-tesserocr-recognize '*' working as expected.

So, something wrong with iterating over the processors for the wildcard case.

bertsky commented 1 year ago

What does the resmgr log say?

kba commented 1 year ago

What does the resmgr log say?

Nothing interesting, it only logs what it is downloading, not what it's supposed to be downloading or how it decided which processors should be included. I'll add a such a log statement when debugging.

MehmedGIT commented 1 year ago

Here is a snippet from my sbatch script that downloads all models:

singularity exec --bind "${SCRATCH_OCRD_MODELS_BASE}:/usr/local/share" "${SIF_PATH}" ocrd resmgr download '*'
singularity exec --bind "${SCRATCH_OCRD_MODELS_BASE}:/usr/local/share" "${SIF_PATH}" ocrd resmgr download ocrd-tesserocr-recognize '*'
In the scratch storage of the HPC environment `${SCRATCH_OCRD_MODELS_BASE} = /scratch1/users/mmustaf/ocrd_models` ```bash gwdu101:127 16:11:22 /scratch1/users/mmustaf/ocrd_models > du -ha 512 ./tessdata/configs/digits 512 ./tessdata/configs/box.train 512 ./tessdata/configs/unlv 512 ./tessdata/configs/hocr 512 ./tessdata/configs/pdf 512 ./tessdata/configs/ambigs.train 512 ./tessdata/configs/kannada 512 ./tessdata/configs/get.images 512 ./tessdata/configs/makebox 512 ./tessdata/configs/alto 512 ./tessdata/configs/linebox 512 ./tessdata/configs/api_config 512 ./tessdata/configs/bigram 512 ./tessdata/configs/bazaar 512 ./tessdata/configs/txt 512 ./tessdata/configs/lstmbox 512 ./tessdata/configs/tsv 512 ./tessdata/configs/logfile 512 ./tessdata/configs/box.train.stderr 512 ./tessdata/configs/quiet 512 ./tessdata/configs/wordstrbox 512 ./tessdata/configs/lstm.train 512 ./tessdata/configs/rebox 512 ./tessdata/configs/Makefile.am 512 ./tessdata/configs/inter 512 ./tessdata/configs/strokewidth 512 ./tessdata/configs/lstmdebug 14K ./tessdata/configs 2,2M ./tessdata/equ.traineddata 1,1M ./tessdata/Fraktur_GT4HistOCR.traineddata 11M ./tessdata/Fraktur.traineddata 4,2M ./tessdata/ONB.traineddata 4,0M ./tessdata/eng.traineddata 11M ./tessdata/osd.traineddata 6,2M ./tessdata/frk.traineddata 1,5M ./tessdata/deu.traineddata 3,3M ./tessdata/frak2021.traineddata 86M ./tessdata/Latin.traineddata 128M ./tessdata 80M ./ocrd-resources/ocrd-cis-ocropy-recognize/en-default.pyrnn.gz 17M ./ocrd-resources/ocrd-cis-ocropy-recognize/LatinHist.pyrnn.gz 42M ./ocrd-resources/ocrd-cis-ocropy-recognize/fraktur.pyrnn.gz 2,9M ./ocrd-resources/ocrd-cis-ocropy-recognize/fraktur-jze.pyrnn.gz 141M ./ocrd-resources/ocrd-cis-ocropy-recognize 18M ./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/4.ckpt.h5 29K ./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/1.ckpt.json 18M ./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/0.ckpt.h5 29K ./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/4.ckpt.json 29K ./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/3.ckpt.json 29K ./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/2.ckpt.json 18M ./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/1.ckpt.h5 29K ./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/0.ckpt.json 18M ./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/2.ckpt.h5 18M ./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/3.ckpt.h5 89M ./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19 19M ./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/4.ckpt.h5 47K ./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/1.ckpt.json 19M ./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/0.ckpt.h5 47K ./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/4.ckpt.json 47K ./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/3.ckpt.json 47K ./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/2.ckpt.json 19M ./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/1.ckpt.h5 47K ./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/0.ckpt.json 19M ./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/2.ckpt.h5 19M ./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/3.ckpt.h5 92M ./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3 19M ./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/2.ckpt.h5 24K ./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/1.ckpt.json 24K ./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/2.ckpt.json 24K ./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/4.ckpt.json 19M ./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/1.ckpt.h5 19M ./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/4.ckpt.h5 19M ./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/3.ckpt.h5 24K ./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/0.ckpt.json 24K ./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/3.ckpt.json 19M ./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/0.ckpt.h5 92M ./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0 272M ./ocrd-resources/ocrd-calamari-recognize 147M ./ocrd-resources/ocrd-sbb-binarize/default-2021-03-09/model_bin_sbb_ens.h5 147M ./ocrd-resources/ocrd-sbb-binarize/default-2021-03-09 147M ./ocrd-resources/ocrd-sbb-binarize 560M ./ocrd-resources 687M . ```

For comparison check the models downloaded with older version (not sure which one, the latest one in January) of ocrd/all:maximum when ocrd-tesserocr-recognize models used to be located under ocrd-resources folder:

docker run --rm -v "/home/cloud/ocrd_models/:/usr/local/share/ocrd-resources" -- ocrd/all:maximum ocrd resmgr download '*'
In the Operandi live VM: ```bash cloud@operandi-live:~/ocrd_models$ du -ha 2,8M ./ocrd-kraken-recognize/en_best.mlmodel 2,9M ./ocrd-kraken-recognize 438M ./ocrd-sbb-textline-detector/default/model_page_mixed_best.h5 438M ./ocrd-sbb-textline-detector/default/model_textline_new.h5 438M ./ocrd-sbb-textline-detector/default/model_strukturerkennung.h5 1,3G ./ocrd-sbb-textline-detector/default 1,3G ./ocrd-sbb-textline-detector 4,0K ./ocrd-anybaseocr-dewarp/latest_net_G.pth 8,0K ./ocrd-anybaseocr-dewarp 1,5M ./ocrd-tesserocr-recognize/deu.traineddata 2,2M ./ocrd-tesserocr-recognize/equ.traineddata 11M ./ocrd-tesserocr-recognize/Fraktur.traineddata 4,0M ./ocrd-tesserocr-recognize/eng.traineddata 6,2M ./ocrd-tesserocr-recognize/frk.traineddata 11M ./ocrd-tesserocr-recognize/osd.traineddata 3,3M ./ocrd-tesserocr-recognize/frak2021.traineddata 1,1M ./ocrd-tesserocr-recognize/Fraktur_GT4HistOCR.traineddata 4,2M ./ocrd-tesserocr-recognize/ONB.traineddata 4,0K ./ocrd-tesserocr-recognize/configs/get.images 4,0K ./ocrd-tesserocr-recognize/configs/lstmdebug 4,0K ./ocrd-tesserocr-recognize/configs/box.train 4,0K ./ocrd-tesserocr-recognize/configs/Makefile.am 4,0K ./ocrd-tesserocr-recognize/configs/lstmbox 4,0K ./ocrd-tesserocr-recognize/configs/api_config 4,0K ./ocrd-tesserocr-recognize/configs/kannada 4,0K ./ocrd-tesserocr-recognize/configs/wordstrbox 4,0K ./ocrd-tesserocr-recognize/configs/bazaar 4,0K ./ocrd-tesserocr-recognize/configs/box.train.stderr 4,0K ./ocrd-tesserocr-recognize/configs/strokewidth 4,0K ./ocrd-tesserocr-recognize/configs/txt 4,0K ./ocrd-tesserocr-recognize/configs/linebox 4,0K ./ocrd-tesserocr-recognize/configs/unlv 4,0K ./ocrd-tesserocr-recognize/configs/lstm.train 4,0K ./ocrd-tesserocr-recognize/configs/hocr 4,0K ./ocrd-tesserocr-recognize/configs/digits 4,0K ./ocrd-tesserocr-recognize/configs/logfile 4,0K ./ocrd-tesserocr-recognize/configs/inter 4,0K ./ocrd-tesserocr-recognize/configs/pdf 4,0K ./ocrd-tesserocr-recognize/configs/bigram 4,0K ./ocrd-tesserocr-recognize/configs/quiet 4,0K ./ocrd-tesserocr-recognize/configs/alto 4,0K ./ocrd-tesserocr-recognize/configs/tsv 4,0K ./ocrd-tesserocr-recognize/configs/makebox 4,0K ./ocrd-tesserocr-recognize/configs/rebox 4,0K ./ocrd-tesserocr-recognize/configs/ambigs.train 112K ./ocrd-tesserocr-recognize/configs 86M ./ocrd-tesserocr-recognize/Latin.traineddata 128M ./ocrd-tesserocr-recognize 4,9M ./ocrd-kraken-segment/blla.mlmodel 4,9M ./ocrd-kraken-segment 438M ./ocrd-sbb-binarize/default/model_bin3.h5 438M ./ocrd-sbb-binarize/default/model_bin2.h5 438M ./ocrd-sbb-binarize/default/model_bin1.h5 438M ./ocrd-sbb-binarize/default/model_bin4.h5 1,8G ./ocrd-sbb-binarize/default 147M ./ocrd-sbb-binarize/default-2021-03-09/model_bin_sbb_ens.h5 147M ./ocrd-sbb-binarize/default-2021-03-09 1,9G ./ocrd-sbb-binarize 4,0K ./ocrd-anybaseocr-tiseg/seg_model/assets 4,1M ./ocrd-anybaseocr-tiseg/seg_model/saved_model.pb 63M ./ocrd-anybaseocr-tiseg/seg_model/variables/variables.data-00001-of-00002 100K ./ocrd-anybaseocr-tiseg/seg_model/variables/variables.data-00000-of-00002 20K ./ocrd-anybaseocr-tiseg/seg_model/variables/variables.index 63M ./ocrd-anybaseocr-tiseg/seg_model/variables 67M ./ocrd-anybaseocr-tiseg/seg_model 67M ./ocrd-anybaseocr-tiseg 2,9M ./ocrd-cis-ocropy-recognize/fraktur-jze.pyrnn.gz 17M ./ocrd-cis-ocropy-recognize/LatinHist.pyrnn.gz 42M ./ocrd-cis-ocropy-recognize/fraktur.pyrnn.gz 80M ./ocrd-cis-ocropy-recognize/en-default.pyrnn.gz 141M ./ocrd-cis-ocropy-recognize 18M ./ocrd-calamari-recognize/zpd-fraktur19/0.ckpt.h5 32K ./ocrd-calamari-recognize/zpd-fraktur19/3.ckpt.json 18M ./ocrd-calamari-recognize/zpd-fraktur19/3.ckpt.h5 18M ./ocrd-calamari-recognize/zpd-fraktur19/1.ckpt.h5 32K ./ocrd-calamari-recognize/zpd-fraktur19/1.ckpt.json 32K ./ocrd-calamari-recognize/zpd-fraktur19/0.ckpt.json 18M ./ocrd-calamari-recognize/zpd-fraktur19/4.ckpt.h5 32K ./ocrd-calamari-recognize/zpd-fraktur19/2.ckpt.json 18M ./ocrd-calamari-recognize/zpd-fraktur19/2.ckpt.h5 32K ./ocrd-calamari-recognize/zpd-fraktur19/4.ckpt.json 89M ./ocrd-calamari-recognize/zpd-fraktur19 19M ./ocrd-calamari-recognize/qurator-gt4histocr-1.0/0.ckpt.h5 24K ./ocrd-calamari-recognize/qurator-gt4histocr-1.0/3.ckpt.json 19M ./ocrd-calamari-recognize/qurator-gt4histocr-1.0/3.ckpt.h5 19M ./ocrd-calamari-recognize/qurator-gt4histocr-1.0/1.ckpt.h5 24K ./ocrd-calamari-recognize/qurator-gt4histocr-1.0/1.ckpt.json 24K ./ocrd-calamari-recognize/qurator-gt4histocr-1.0/0.ckpt.json 19M ./ocrd-calamari-recognize/qurator-gt4histocr-1.0/4.ckpt.h5 24K ./ocrd-calamari-recognize/qurator-gt4histocr-1.0/2.ckpt.json 19M ./ocrd-calamari-recognize/qurator-gt4histocr-1.0/2.ckpt.h5 24K ./ocrd-calamari-recognize/qurator-gt4histocr-1.0/4.ckpt.json 92M ./ocrd-calamari-recognize/qurator-gt4histocr-1.0 19M ./ocrd-calamari-recognize/zpd-latin-script-hist-3/0.ckpt.h5 48K ./ocrd-calamari-recognize/zpd-latin-script-hist-3/3.ckpt.json 19M ./ocrd-calamari-recognize/zpd-latin-script-hist-3/3.ckpt.h5 19M ./ocrd-calamari-recognize/zpd-latin-script-hist-3/1.ckpt.h5 48K ./ocrd-calamari-recognize/zpd-latin-script-hist-3/1.ckpt.json 48K ./ocrd-calamari-recognize/zpd-latin-script-hist-3/0.ckpt.json 19M ./ocrd-calamari-recognize/zpd-latin-script-hist-3/4.ckpt.h5 48K ./ocrd-calamari-recognize/zpd-latin-script-hist-3/2.ckpt.json 19M ./ocrd-calamari-recognize/zpd-latin-script-hist-3/2.ckpt.h5 48K ./ocrd-calamari-recognize/zpd-latin-script-hist-3/4.ckpt.json 92M ./ocrd-calamari-recognize/zpd-latin-script-hist-3 272M ./ocrd-calamari-recognize 4,0K ./ocrd-anybaseocr-block-segmentation/block_segmentation_weights.h5 8,0K ./ocrd-anybaseocr-block-segmentation 28M ./ocrd-typegroups-classifier/densenet121.tgc 28M ./ocrd-typegroups-classifier 147M ./ocrd-eynollah-segment/default/model_tables_ens_mixed_new_2.h5 147M ./ocrd-eynollah-segment/default/model_textline_newspapers.h5 147M ./ocrd-eynollah-segment/default/model_main_covid19_lr5-5_scale_1_1_great.h5 147M ./ocrd-eynollah-segment/default/model_page_mixed_best.h5 127M ./ocrd-eynollah-segment/default/model_enhancement.h5 147M ./ocrd-eynollah-segment/default/model_bin_sbb_ens.h5 147M ./ocrd-eynollah-segment/default/model_3up_new_good_no_augmentation.h5 99M ./ocrd-eynollah-segment/default/model_scale_classifier.h5 147M ./ocrd-eynollah-segment/default/model_no_patches_class0_30eopch.h5 147M ./ocrd-eynollah-segment/default/model_main_home_corona3_rot.h5 147M ./ocrd-eynollah-segment/default/model_ensemble_s.h5 1,6G ./ocrd-eynollah-segment/default 1,6G ./ocrd-eynollah-segment 4,0K ./ocrd-anybaseocr-layout-analysis/structure_analysis/assets 14M ./ocrd-anybaseocr-layout-analysis/structure_analysis/saved_model.pb 29M ./ocrd-anybaseocr-layout-analysis/structure_analysis/variables/variables.data-00001-of-00002 248K ./ocrd-anybaseocr-layout-analysis/structure_analysis/variables/variables.data-00000-of-00002 44K ./ocrd-anybaseocr-layout-analysis/structure_analysis/variables/variables.index 30M ./ocrd-anybaseocr-layout-analysis/structure_analysis/variables 43M ./ocrd-anybaseocr-layout-analysis/structure_analysis 4,0K ./ocrd-anybaseocr-layout-analysis/mapping_densenet.pickle 43M ./ocrd-anybaseocr-layout-analysis 5,4G . ```

The models are way less than what they used to be. The total size of the downloaded models is just 687MB. It used to be around 5.4GB. Also some processor models are now completely missing or not downloaded at all.

bertsky commented 1 year ago

It's clear the reason for this is that ResourceManager.list_available only returns database results – it does not look up all ocrd- executables in PATH. (For comparison, ResourceManager.list_installed returns database results and all resource location paths with ocrd- prefix, which is somewhat better, but still misses out on processors' module locations, as in ocrd_tesserocr.) The database then is simply the distributed resource_list.yml plus any user resources.yml. At no time do we guarantee that the latter gets filled from PATH dynamically!

I cannot find when exactly this broke, but this change looks somewhat fishy.

Since we never know when the user installs (additional) processor modules, and the database files can be out of date (as is currently the case with the distributed resource_list.yml which still contains sbb-textline-detector), IMO the correct behaviour would be:

bertsky commented 1 year ago

Speaking of short-circuiting with ocrd-all-tool.json: we do not have a dedicated issue for that, but since it's probably tied to the solution here, anyway: The idea would be to have a lookup mechanism like for ocrd_logging.conf (i.e. system location, XDG-based user location, CWD) as an opt-in for ocrd-all-tool.json. If that file can be found, then replace all dynamic lookups with queries into the list of all tools and their resources. (Of course, relying on that file creates new problems like keeping ocrd-all-tool.json up to date if you install more tools, but let's first concentrate on the substantial performance gains that this will yield.)

kba commented 1 year ago

I've opened a separate issue for the ocrd-all-tool.json aspect in https://github.com/OCR-D/core/issues/1059