Calamari-OCR / calamari_models

Pretrained mixed models to be used with Calamari.
MIT License
58 stars 17 forks source link

python version of models: "bad marshal data" #12

Closed andbue closed 1 month ago

andbue commented 3 years ago

When loading keras models, the python version needs to be equal between the system the model was trained on and the system loading the file (cf. https://github.com/keras-team/keras/issues/7440). I stumbled upon this when transferring models for inference to another machine running 3.8 instead of 3.7. Would't it be helpful to include this version in the json and provide some more useful error message based on that information? Is there a way to load and save the models in a way that updates them to another python version?

ChWick commented 3 years ago

This is very annoying... As far as I know this is a problem of the hdf5 file which is not backwards-compatible, so opening a python 3.7 model in python 3.8 is possible, but not vice-versa. I don't think that there is a solution.

Probably, we @andbue @chreul should take care to create the shared models in python 3.7,

andbue commented 3 years ago

I created the model in 3.7 and was not able to load it in 3.8, unfortunately...

poke1024 commented 2 years ago

I'm running into a similar problem unfortunately. Using calamari 2.2.2 and Python 3.9, I seem to be unable to load pretrained models. Both on CPU (clean install via pip from scratch via "pip install calamari_ocr==2.2.2") and GPU environments (using TensorFlow 2.6), I get the following error when trying to load the idiotikon model:

File "/u/liebl/miniconda3/envs/origami_cpu/lib/python3.9/site-packages/calamari_ocr/ocr/predict/predictor.py", line 53, in from_paths
    multi_predictor = super(MultiPredictor, cls).from_paths(
  File "/u/liebl/miniconda3/envs/origami_cpu/lib/python3.9/site-packages/tfaip/predict/multimodelpredictor.py", line 107, in from_paths
    models = [
  File "/u/liebl/miniconda3/envs/origami_cpu/lib/python3.9/site-packages/tfaip/predict/multimodelpredictor.py", line 108, in <listcomp>
    keras.models.load_model(model, compile=False, custom_objects=scenario.model_cls().all_custom_objects())
  File "/u/liebl/miniconda3/envs/origami_cpu/lib/python3.9/site-packages/keras/saving/save.py", line 200, in load_model
    return hdf5_format.load_model_from_hdf5(filepath, custom_objects,
  File "/u/liebl/miniconda3/envs/origami_cpu/lib/python3.9/site-packages/keras/saving/hdf5_format.py", line 180, in load_model_from_hdf5
    model = model_config_lib.model_from_config(model_config,
  File "/u/liebl/miniconda3/envs/origami_cpu/lib/python3.9/site-packages/keras/saving/model_config.py", line 52, in model_from_config
    return deserialize(config, custom_objects=custom_objects)
  File "/u/liebl/miniconda3/envs/origami_cpu/lib/python3.9/site-packages/keras/layers/serialization.py", line 208, in deserialize
    return generic_utils.deserialize_keras_object(
  File "/u/liebl/miniconda3/envs/origami_cpu/lib/python3.9/site-packages/keras/utils/generic_utils.py", line 674, in deserialize_keras_object
    deserialized_obj = cls.from_config(
  File "/u/liebl/miniconda3/envs/origami_cpu/lib/python3.9/site-packages/keras/engine/functional.py", line 662, in from_config
    input_tensors, output_tensors, created_layers = reconstruct_from_config(
  File "/u/liebl/miniconda3/envs/origami_cpu/lib/python3.9/site-packages/keras/engine/functional.py", line 1273, in reconstruct_from_config
    process_layer(layer_data)
  File "/u/liebl/miniconda3/envs/origami_cpu/lib/python3.9/site-packages/keras/engine/functional.py", line 1255, in process_layer
    layer = deserialize_layer(layer_data, custom_objects=custom_objects)
  File "/u/liebl/miniconda3/envs/origami_cpu/lib/python3.9/site-packages/keras/layers/serialization.py", line 208, in deserialize
    return generic_utils.deserialize_keras_object(
  File "/u/liebl/miniconda3/envs/origami_cpu/lib/python3.9/site-packages/keras/utils/generic_utils.py", line 674, in deserialize_keras_object
    deserialized_obj = cls.from_config(
  File "/u/liebl/miniconda3/envs/origami_cpu/lib/python3.9/site-packages/keras/layers/core.py", line 1005, in from_config
    function = cls._parse_function_from_config(
  File "/u/liebl/miniconda3/envs/origami_cpu/lib/python3.9/site-packages/keras/layers/core.py", line 1057, in _parse_function_from_config
    function = generic_utils.func_load(
  File "/u/liebl/miniconda3/envs/origami_cpu/lib/python3.9/site-packages/keras/utils/generic_utils.py", line 789, in func_load
    code = marshal.loads(raw_code)
ValueError: bad marshal data (unknown type code)
andbue commented 2 years ago

Hi Bernhard! With https://github.com/Calamari-OCR/calamari/commit/2fa93d880fd306bdb2171bf4ed5e4538cc0dc79f I implemented the savedmodel format instead of the h5 files. It's a folder of different files and takes up more disk space, but I hope that it will solve the compatibility problems between different versions of python. It's only in the tempscale branch at the moment and not really tested, but if you've got the time at hand you could give it a try (loading the models with py37, waiting for them to be converted to version 6, then using them in py39). If it works, I could merge that into master and later update the models in this repo.

bertsky commented 1 month ago

Note: I have converted all models here and in calamari_models_experimental to v6 (SavedModel format) and created releases with tarballs as assets. Closing here – but mind that for the time being you'll have to install Calamari from git instead of PyPI because we have not released 2.3 with the new feature yet.