Calamari-OCR / calamari_models

Pretrained mixed models to be used with Calamari.
MIT License
58 stars 17 forks source link

forever "Upgrading from version ..." with fraktur_19th_century (via OCR-D) #4

Closed jbarth-ubhd closed 4 years ago

jbarth-ubhd commented 4 years ago

Thanks! Just trying it... but it seems to take forever:

12:38:15.117 INFO ocrd.task_sequence.run_tasks - Start processing task 'calamari
-recognize -I OCR-D-N11 -O OCR-D-OCR -p {"checkpoint":"/usr/local/ocrd_models/ca
lamari/calamari_models/fraktur_19th_century/*.ckpt.json"}'
12:38:16.414 INFO ocrd.workspace_validator - input_file_grp=['OCR-D-N11'] output
_file_grp=['OCR-D-OCR']
Upgrading from version 2
Upgrading from version 3
Upgrading from version 4
Upgrading from version 5
Upgrading from version 6
Upgrading from version 7
Upgrading from version 8
...
Upgrading from version 1738637
Upgrading from version 1738638
Upgrading from version 1738639
...

now aborting... what I've done wrong?

PS: The same command works with https://qurator-data.de/calamari-models/GT4HistOCR/model.tar.xz

ChWick commented 4 years ago

This is a known bug of older calamari versions, when opening a model created with a newer Calamari version. Try to upgrade calamari first, then you need to fix your model: open the json file in any editor, search for version and change it to 2. Please let me know if that worked for you.

maxnth commented 4 years ago

It looks like you're using the ocr-d wrapper for calamari and therefore you probably have to wait until the wrapper gets updated to support Calamari version >1.x (currently it's using 0.3.5 and the GT4HistOCR was also trained with this version) or install the newest Calamari version "directly" in case you want to use the newest versions of the models.

In case you want to use the _fraktur_19thcentury models with the current version of the ocr-d wrapper you could clone this repository and reset it to the commit f76b1d3ece5ff46a59c217efd202eb5d8e729cb4. The models at this commit still support calamari <=0.3.5 and therefore should work with the ocr-d wrapper.

wrznr commented 4 years ago

Does this mean that there exists an engine-model-version dependency in Calamari?

ChWick commented 4 years ago

Yes. However, older models get updated automatically (usually there is just a new entry in the model.json due to a new feature). Unfortunately, there was a big break because of the Tensorflow 1->2 upgrade (since Calamari 1.0). In TF 2, the recurrent networks (e.g. LSTM in combination with CUDNN) got fundamentally reworked/unified which is why the old model parameters could not be converted. As TF 2 is now "clean", we do not expect any further complications with any model upgrade when updating Calamari.

Obviously, referencing this issue, using an older Calamari version with newer models is not supported. This "check" was missing in older Calamari versions.