NatLibFi / Annif

Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.
https://annif.org
Other
188 stars 41 forks source link

annif docker container failed to run #784

Closed s0rin closed 2 months ago

s0rin commented 3 months ago

Using annif_app as in https://github.com/NatLibFi/Annif/blob/main/docker-compose.yml, the annif docker container was running a for while until pulling recently the image quay.io/natlibfi/annif:latest, since then it fails with the following error:

# docker logs --follow docker_annif
[2024-04-05 12:02:34 +0000] [1] [INFO] Starting gunicorn 21.2.0
[2024-04-05 12:02:34 +0000] [1] [INFO] Listening at: http://0.0.0.0:8000 (1)
[2024-04-05 12:02:34 +0000] [1] [INFO] Using worker: sync
[2024-04-05 12:02:34 +0000] [7] [INFO] Booting worker with pid: 7
/usr/local/lib/python3.10/site-packages/sklearn/base.py:376: InconsistentVersionWarning: Trying to unpickle estimator TfidfTransformer from version 1.3.2 when using version                                                               1.4.1.post1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
/usr/local/lib/python3.10/site-packages/sklearn/base.py:376: InconsistentVersionWarning: Trying to unpickle estimator TfidfVectorizer from version 1.3.2 when using version 1                                                              .4.1.post1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
/usr/local/lib/python3.10/site-packages/sklearn/base.py:376: InconsistentVersionWarning: Trying to unpickle estimator TfidfTransformer from version 1.3.2 when using version                                                               1.4.1.post1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
/usr/local/lib/python3.10/site-packages/sklearn/base.py:376: InconsistentVersionWarning: Trying to unpickle estimator TfidfVectorizer from version 1.3.2 when using version 1                                                              .4.1.post1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
WARNING:annif:Couldn't initialize backend 'fasttext': model data/projects/hogwarts/fasttext-model not found
WARNING:annif:Couldn't initialize backend 'tfidf': vectorizer file 'data/projects/tfidf-fi/vectorizer' not found
WARNING:annif:Couldn't initialize backend 'tfidf': vectorizer file 'data/projects/tfidf-sv/vectorizer' not found
WARNING:annif:Couldn't initialize backend 'tfidf': vectorizer file 'data/projects/tfidf-en/vectorizer' not found
WARNING:annif:Couldn't initialize backend 'fasttext': model data/projects/fasttext-fi/fasttext-model not found
WARNING:annif:Couldn't initialize backend 'fasttext': model data/projects/fasttext-sv/fasttext-model not found
WARNING:annif:Couldn't initialize backend 'fasttext': model data/projects/fasttext-en/fasttext-model not found
/usr/local/lib/python3.10/site-packages/sklearn/base.py:376: InconsistentVersionWarning: Trying to unpickle estimator CountVectorizer from version 1.3.2 when using version 1                                                              .4.1.post1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
/usr/local/lib/python3.10/site-packages/sklearn/base.py:376: InconsistentVersionWarning: Trying to unpickle estimator DecisionTreeClassifier from version 1.3.2 when using ve                                                              rsion 1.4.1.post1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
/usr/local/lib/python3.10/site-packages/sklearn/base.py:376: InconsistentVersionWarning: Trying to unpickle estimator BaggingClassifier from version 1.3.2 when using version                                                               1.4.1.post1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
2024-04-05 12:02:44.113014: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critica                                                              l operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-05 12:02:52.370582: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 28799200 exceeds 10% of free system memory.
2024-04-05 12:02:52.470843: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 28799200 exceeds 10% of free system memory.
2024-04-05 12:02:52.497902: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 28799200 exceeds 10% of free system memory.
2024-04-05 12:02:52.603658: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 14399600 exceeds 10% of free system memory.
2024-04-05 12:02:52.642776: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 28799200 exceeds 10% of free system memory.
[2024-04-05 12:02:54 +0000] [7] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/gunicorn/arbiter.py", line 609, in spawn_worker
    worker.init_process()
  File "/usr/local/lib/python3.10/site-packages/gunicorn/workers/base.py", line 134, in init_process
    self.load_wsgi()
  File "/usr/local/lib/python3.10/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/usr/local/lib/python3.10/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/usr/local/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
    return self.load_wsgiapp()
  File "/usr/local/lib/python3.10/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
    return util.import_app(self.app_uri)
  File "/usr/local/lib/python3.10/site-packages/gunicorn/util.py", line 424, in import_app
    app = app(*args, **kwargs)
  File "/Annif/annif/__init__.py", line 59, in create_app
    annif.registry.initialize_projects(cxapp.app)
  File "/Annif/annif/registry.py", line 114, in initialize_projects
    app.annif_registry = AnnifRegistry(projects_config_path, datadir, init_projects)
  File "/Annif/annif/registry.py", line 41, in __init__
    project.initialize()
  File "/Annif/annif/project.py", line 132, in initialize
    self._initialize_backend(parallel)
  File "/Annif/annif/project.py", line 116, in _initialize_backend
    self.backend.initialize(parallel)
  File "/Annif/annif/backend/nn_ensemble.py", line 132, in initialize
    self._model = load_model(
  File "/usr/local/lib/python3.10/site-packages/keras/src/saving/saving_api.py", line 254, in load_model
    return saving_lib.load_model(
  File "/usr/local/lib/python3.10/site-packages/keras/src/saving/saving_lib.py", line 281, in load_model
    raise e
  File "/usr/local/lib/python3.10/site-packages/keras/src/saving/saving_lib.py", line 269, in load_model
    _load_state(
  File "/usr/local/lib/python3.10/site-packages/keras/src/saving/saving_lib.py", line 466, in _load_state
    _load_container_state(
  File "/usr/local/lib/python3.10/site-packages/keras/src/saving/saving_lib.py", line 534, in _load_container_state
    _load_state(
  File "/usr/local/lib/python3.10/site-packages/keras/src/saving/saving_lib.py", line 435, in _load_state
    trackable.load_own_variables(weights_store.get(inner_path))
  File "/usr/local/lib/python3.10/site-packages/keras/src/engine/base_layer.py", line 3531, in load_own_variables
    raise ValueError(
ValueError: Layer 'dense' expected 2 variables, but received 0 variables during loading. Expected: ['dense/kernel:0', 'dense/bias:0']
[2024-04-05 12:02:54 +0000] [7] [INFO] Worker exiting (pid: 7)
[2024-04-05 12:02:57 +0000] [1] [ERROR] Worker (pid:7) exited with code 3
[2024-04-05 12:02:57 +0000] [1] [ERROR] Shutting down: Master
[2024-04-05 12:02:57 +0000] [1] [ERROR] Reason: Worker failed to boot.
osma commented 3 months ago

Do you have an existing data directory with models trained using a previous Annif version?

We recently upgraded many of our dependencies - see PR #771 . For example TensorFlow was upgraded. Sadly, this means that old models cannot be loaded anymore and projects need to be retrained.

Thanks for reporting this issue. I think this kind of problem should be handled better instead of throwing an ugly exception.

s0rin commented 3 months ago

Indeed, the annif docker container could run again after removing vocabs/ and projects/ from the annif-projects/data/ directory.

juhoinkinen commented 3 months ago

The TensorFlow/Keras model file has some metadata, which contains keras_version:

head -n1 data-fintoai/projects/yso-fi/nn-model.keras 
metadata.json{"keras_version": "2.13.1", "date_saved": "2023-08-22@14:41:52"}PK!��33
                                                                                    config.json{"module": "keras.src.engine.functional", "class_name": "Functional", "config": {"name": "model", "trainable": true, "layers": [{"module": "keras.layers", "class_name": "InputLayer", "config": {"batch_input_shape": [null, 38586, 3], "dtype": "float32", "sparse": false, "ragged": false, "name": "input_1"}, "registered_name": null, "name": "input_1", "inbound_nodes": []}, {"module": "keras.layers", "class_name": "Flatten", "config": {"name": "flatten", "trainable": true, "dtype": "float32", "data_format": "channels_last"}, "registered_name": null, "build_config": {"input_shape": [null, 38586, 3]}, "name": "flatten", "inbound_nodes": [[["input_1", 0, 0, {}]]]}, {"module": "keras.layers", "class_name": "Dropout", "config": {"name": "dropout", "trainable": true, "dtype": "float32", "rate": 0.2, "noise_shape": null, "seed": null}, "registered_name": null, "build_config": {"input_shape": [null, 115758]}, "name": "dropout", "inbound_nodes": [[["flatten", 0, 0, {}]]]}, {"module": "keras.layers", "class_name": "Dense", "config": {"name": "dense", "trainable": true, "dtype": "float32", "units": 100, "activation": "relu", "use_bias": true, "kernel_initializer": {"module": "keras.initializers", "class_name": "GlorotUniform", "config": {"seed": null}, "registered_name": null}, "bias_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "kernel_regularizer": null, "bias_regularizer": null, "activity_regularizer": null, "kernel_constraint": null, "bias_constraint": null}, "registered_name": null, "build_config": {"input_shape": [null, 115758]}, "name": "dense", "inbound_nodes": [[["dropout", 0, 0, {}]]]}, {"module": "keras.layers", "class_name": "Dropout", "config": {"name": "dropout_1", "trainable": true, "dtype": "float32", "rate": 0.2, "noise_shape": null, "seed": null}, "registered_name": null, "build_config": {"input_shape": [null, 100]}, "name": "dropout_1", "inbound_nodes": [[["dense", 0, 0, {}]]]}, {"module": "annif.backend.nn_ensemble", "class_name": "MeanLayer", "config": {"name": "mean_layer", "trainable": true, "dtype": "float32"}, "registered_name": "MeanLayer", "build_config": {"input_shape": [null, 38586, 3]}, "name": "mean_layer", "inbound_nodes": [[["input_1", 0, 0, {}]]]}, {"module": "keras.layers", "class_name": "Dense", "config": {"name": "dense_1", "trainable": true, "dtype": "float32", "units": 38586, "activation": "linear", "use_bias": true, "kernel_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "bias_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "kernel_regularizer": null, "bias_regularizer": null, "activity_regularizer": null, "kernel_constraint": null, "bias_constraint": null}, "registered_name": null, "build_config": {"input_shape": [null, 100]}, "name": "dense_1", "inbound_nodes": [[["dropout_1", 0, 0, {}]]]}, {"module": "keras.layers", "class_name": "Add", "config": {"name": "add", "trainable": true, "dtype": "float32"}, "registered_name": null, "build_config": {"input_shape": [[null, 38586], [null, 38586]]}, "name": "add", "inbound_nodes": [[["mean_layer", 0, 0, {}], ["dense_1", 0, 0, {}]]]}], "input_layers": [["input_1", 0, 0]], "output_layers": [["add", 0, 0]]}, "registered_name": "Functional", "build_config": {"input_shape": [null, 38586, 3]}, "compile_config": {"optimizer": "adam", "loss": "binary_crossentropy", "metrics": ["top_k_categorical_a�=W��cy"], "loss_weights": null, "weighted_metrics": null, "run_eagerly": null, "steps_per_execution": null, "jit_compile": null}}PK:uW

The file contains also the model weights in binary(?) , but somehow the metadata and the version number could surely be read, but I did not find any way to read it with TensorFlow.

It would be nice to show the version of TF that created the model on crashes like this.

juhoinkinen commented 3 months ago

Heh, googling did not give anything usable, CurreChat gave a directly working solution after few tries:

My apologies for the confusion. It looks like the information provided is from an actual .keras file which is a zip-like file format used by newer versions of Keras/TensorFlow for saving models.


from zipfile import ZipFile
import json

# Replace this with the path to your .keras file
model_file_path = 'path_to_your_model.keras'

# Opening the .keras file as a zip file
with ZipFile(model_file_path, 'r') as zip:
   # Reading the metadata.json file inside the .keras zip
   with zip.open('metadata.json') as metadata_file:
       # Decoding the bytes to a string
       metadata_str = metadata_file.read().decode('utf-8')
       # Converting the string to a JSON (dictionary) object
       metadata = json.loads(metadata_str)
       # Now you can use or print the metadata
       print(metadata)