NatLibFi / Annif

Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.
https://annif.org
Other
188 stars 41 forks source link

Online learning with an NN-ensemble seems to fail in the backend #505

Closed Hurttaj closed 2 years ago

Hurttaj commented 2 years ago

I have an nn-ensemble model using Fasttext, Omikuji, and MLLM as backends. The model works as expected for suggestions, including with the rest API. I was testing the online learning API, and the backend consistently returned error code 500. Digging into apache error logs, I found the following traceback:

[Fri Jul 09 11:02:50.743052 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP] Traceback (most recent call last):
[Fri Jul 09 11:02:50.743058 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]   File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2051, in wsgi_app
[Fri Jul 09 11:02:50.743063 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]     response = self.full_dispatch_request()
[Fri Jul 09 11:02:50.743068 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]   File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1501, in full_dispatch_request
[Fri Jul 09 11:02:50.743072 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]     rv = self.handle_user_exception(e)
[Fri Jul 09 11:02:50.743077 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]   File "/srv/Annif/.local/lib/python3.8/site-packages/flask_cors/extension.py", line 165, in wrapped_function
[Fri Jul 09 11:02:50.743082 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]     return cors_after_request(app.make_response(f(*args, **kwargs)))
[Fri Jul 09 11:02:50.743086 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]   File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1499, in full_dispatch_request
[Fri Jul 09 11:02:50.743091 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]     rv = self.dispatch_request()
[Fri Jul 09 11:02:50.743095 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]   File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1485, in dispatch_request
[Fri Jul 09 11:02:50.743100 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]     return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
[Fri Jul 09 11:02:50.743104 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]   File "/usr/local/lib/python3.8/dist-packages/connexion/decorators/decorator.py", line 48, in wrapper
[Fri Jul 09 11:02:50.743108 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]     response = function(request)
[Fri Jul 09 11:02:50.743113 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]   File "/usr/local/lib/python3.8/dist-packages/connexion/decorators/uri_parsing.py", line 144, in wrapper
[Fri Jul 09 11:02:50.743117 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]     response = function(request)
[Fri Jul 09 11:02:50.743121 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]   File "/usr/local/lib/python3.8/dist-packages/connexion/decorators/validation.py", line 184, in wrapper
[Fri Jul 09 11:02:50.743126 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]     response = function(request)
[Fri Jul 09 11:02:50.743130 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]   File "/usr/local/lib/python3.8/dist-packages/connexion/decorators/validation.py", line 384, in wrapper
[Fri Jul 09 11:02:50.743149 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]     return function(request)
[Fri Jul 09 11:02:50.743154 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]   File "/usr/local/lib/python3.8/dist-packages/connexion/decorators/parameter.py", line 121, in wrapper
[Fri Jul 09 11:02:50.743158 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]     return function(**kwargs)
[Fri Jul 09 11:02:50.743162 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]   File "/srv/Annif/annif/rest.py", line 91, in learn
[Fri Jul 09 11:02:50.743166 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]     project.learn(corpus)
[Fri Jul 09 11:02:50.743170 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]   File "/srv/Annif/annif/project.py", line 202, in learn
[Fri Jul 09 11:02:50.743174 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]     self.backend.learn(corpus, beparams)
[Fri Jul 09 11:02:50.743178 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]   File "/srv/Annif/annif/backend/backend.py", line 130, in learn
[Fri Jul 09 11:02:50.743182 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]     return self._learn(corpus, params=beparams)
[Fri Jul 09 11:02:50.743186 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]   File "/srv/Annif/annif/backend/nn_ensemble.py", line 209, in _learn
[Fri Jul 09 11:02:50.743190 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]     self._fit_model(corpus, int(params['learn-epochs']))
[Fri Jul 09 11:02:50.743194 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]   File "/srv/Annif/annif/backend/nn_ensemble.py", line 200, in _fit_model
[Fri Jul 09 11:02:50.743198 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]     self._model.fit(seq, verbose=True, epochs=epochs)
[Fri Jul 09 11:02:50.743202 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]   File "/srv/Annif/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
[Fri Jul 09 11:02:50.743206 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]     return method(self, *args, **kwargs)
[Fri Jul 09 11:02:50.743210 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]   File "/srv/Annif/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1104, in fit
[Fri Jul 09 11:02:50.743214 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP]     epoch_logs = copy.copy(logs)
[Fri Jul 09 11:02:50.743217 2021] [wsgi:error] [pid 530005:tid 139905605035776] [remote IP] UnboundLocalError: local variable 'logs' referenced before assignment

This installation was set up about 2 months ago, following the instructions here. I've been able to train and retrain the model successfully from the command line, but this seems to fail.

TommiRTVA commented 2 years ago

Thsi seems to be quite similar issue what I had:

https://github.com/NatLibFi/Annif/issues/504

juhoinkinen commented 2 years ago

Thanks for reporting, please see the other issue of the same bug.