NatLibFi / Annif

Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.
https://annif.org
Other
195 stars 41 forks source link

Memory leak in NN ensemble backend #674

Closed juhoinkinen closed 1 year ago

juhoinkinen commented 1 year ago

The Annif pod in the OpenShift environment has been occasionally killed, maybe once in every two weeks. The reason has seemed to be that the memory consumption has reached the limit for the pod (30 GB).

I monitored the memory consumption (RssAnon from /proc/$PID/status) of locally run Annif when running suggest requests to NN ensemble project and its base projects with curl (using fulltext documents from JYU test set; the memory consumption was recorded after every 10 documents), but only in the case of NN ensemble there was a strong increase in the memory consumption: see below for a run for the yso-fi model of Finto AI.

image

I confirmed that the issue could be fixed by following the advice from one relevant discussion, i.e. to use __call__() of the model:

https://github.com/NatLibFi/Annif/blob/73d4f2e4ad2d702709af27e92319d4e945a8d019/annif/backend/nn_ensemble.py#L141

The other mentioned fix, that is applying tf.convert_to_tensor(), did not fix the memory leak; running (also) gc.collect() after each prediction did fix it, but the predictions become very slow (10 requests took ~110 s, when without gc only ~30 s).

However, the NN ensemble could be modified to allow batch processing of the documents, and for that use the Keras documentation seems to suggest using the predict() function, so I'm not sure if the mentioned fix is the best way to go.