Calamari-OCR / calamari

Line based ATR Engine based on OCRopy
GNU General Public License v3.0
1.05k stars 209 forks source link

cannot migrate v1 model #362

Open bertsky opened 1 month ago

bertsky commented 1 month ago

I am trying to make the GT4HistOCR model from Qurator team (trained on Calamari 1) work with Calamari 2.

First problem: I need to downgrade to TF 2.4. But this in turn yields problems finding the right matching keras and tensorflow_addons. I finally settled with Keras 2.3 and tfa 0.14, which seems to work.

But then I end up with

CRITICAL 2024-09-18 16:48:04,542             tfaip.util.logging: Uncaught exception
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/local/ocr-d/calamari/calamari_ocr/ocr/predict/predictor.py", line 52, in from_paths
    checkpoints = [SavedCalamariModel(ckpt, auto_update=auto_update_checkpoints) for ckpt in checkpoints]
  File "/local/ocr-d/calamari/calamari_ocr/ocr/predict/predictor.py", line 52, in <listcomp>
    checkpoints = [SavedCalamariModel(ckpt, auto_update=auto_update_checkpoints) for ckpt in checkpoints]
  File "/local/ocr-d/calamari/calamari_ocr/ocr/savedmodel/saved_model.py", line 32, in __init__
    self.update_checkpoint()
  File "/local/ocr-d/calamari/calamari_ocr/ocr/savedmodel/saved_model.py", line 57, in update_checkpoint
    self._single_upgrade()
  File "/local/ocr-d/calamari/calamari_ocr/ocr/savedmodel/saved_model.py", line 87, in _single_upgrade
    update_model(self.dict, self.ckpt_path)
  File "/local/ocr-d/calamari/calamari_ocr/ocr/savedmodel/migrations/version2to5.py", line 31, in update_model
    load_weights_from_hdf5_group(f, [l for l in graph.layer_instances if len(l.weights) > 0] + [graph.logits])
AttributeError: 'dict' object has no attribute 'layer_instances'

So apparently after migrate2to5 it cannot run update_model, because the built keras.model.Model does not contain a calamari_ocr.ocr.model.graph.Graph (at least not at the expected path).

Looking at the commit history, I stumbled on this change. It seems like there is a dependency on the tfaip version, too.

So I tried also downgrading tfaip to 1.1.1, but to no avail – now I get:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/local/ocr-d/calamari/calamari_ocr/ocr/predict/predictor.py", line 11, in <module>
    from calamari_ocr.ocr.scenario import CalamariScenario
  File "/local/ocr-d/calamari/calamari_ocr/ocr/scenario.py", line 12, in <module>
    from calamari_ocr.ocr.scenario_params import (
  File "/local/ocr-d/calamari/calamari_ocr/ocr/scenario_params.py", line 8, in <module>
    from calamari_ocr.ocr.model.ensemblemodel import EnsembleModelParams
  File "/local/ocr-d/calamari/calamari_ocr/ocr/model/ensemblemodel.py", line 13, in <module>
    from calamari_ocr.ocr.model.params import ModelParams
  File "/local/ocr-d/calamari/calamari_ocr/ocr/model/params.py", line 48, in <module>
    class ModelParams(ModelBaseParams):
  File "/local/ocr-d/calamari/calamari_ocr/ocr/model/params.py", line 51, in ModelParams
    metadata=pai_meta(choices=all_layers(), help="Layers of the graph. See the docs for more information."),
  File "/local/ocr-d/calamari/calamari_ocr/ocr/model/params.py", line 29, in all_layers
    from calamari_ocr.ocr.model.layers.conv2d import Conv2DLayerParams
  File "/local/ocr-d/calamari/calamari_ocr/ocr/model/layers/conv2d.py", line 13, in <module>
    class Conv2DLayerParams(LayerParams):
  File "/local/ocr-d/calamari/calamari_ocr/ocr/model/layers/conv2d.py", line 32, in Conv2DLayerParams
    kernel_size: IntVec2D = field(default_factory=lambda: IntVec2D(3, 3), metadata=pai_meta(tuple_like=True))
TypeError: pai_meta() got an unexpected keyword argument 'tuple_like'

What is the right environment to upgrade this kind of model?