bertsky / workflow-configuration

a makefilization for OCR-D workflows, with configuration examples
Apache License 2.0
9 stars 4 forks source link

use pyrnn.gz models instead of pyrnn #14

Open kba opened 4 years ago

kba commented 4 years ago

Companion to https://github.com/cisocrgroup/ocrd_cis/pull/49

pyrnn models are generally distributed gzipped. If loading gzipped models is inefficient, I can also adapt OCR-D/ocrd_all#103 to gunzip models after download.

bertsky commented 4 years ago

Is this really necessary? In recognize, there's get_model (which tries to append .gz), and then in load_object there's ocropus_find_file (which also tries to append .gz). In short, whenever the file can be found with ocrolib, it can also be found without the .gz suffix.

Now, I understand you pushed for https://github.com/cisocrgroup/ocrd_cis/pull/49, which in turn requires the change here. But as I mentioned in my review there, I don't think that was warranted itself.

It does not hurt to use the full file names, though. Maybe this is better regardless of the change in ocrd_cis. With the full file name, the makefile could also have rules to check / download the model files.

kba commented 4 years ago

It should not be necessary but I failed to get ocropy to pick up the models in OCROPUS_DATA, so I tried on different levels to fix it. I need to properly debug this later, the easiest "fix" was to gunzip the models or use explicit .gz extensions. I should have used the Draft PR feature or an issue.

bertsky commented 4 years ago

It should not be necessary but I failed to get ocropy to pick up the models in OCROPUS_DATA, so I tried on different levels to fix it. I need to properly debug this later, the easiest "fix" was to gunzip the models or use explicit .gz extensions.

Here's a summary of exactly where ocrd_cis is currently looking. Hope that helps.

I am still inclined to switch to the .gz filenames in the makefile regardless.