NatLibFi / Annif

Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.
https://annif.org
Other
201 stars 41 forks source link

Caching analysis results with memcached #241

Open osma opened 5 years ago

osma commented 5 years ago

When testing different settings and training ensembles, often the same documents are analyzed over and over with the same backend. This is needlessly slow. It would help a lot if the analysis results could be cached.

Ideally the cache should be persistent across separate invocations of Annif, sharable by multiple Annif processes working in parallel, and automatically expire cached results after some TTL. memcached seems ideal for this purpose.

The cache keys have to be carefully chosen based on e.g. project configuration and timestamps on trained model files, to avoid stale entries being retrieved from the cache.

kinow commented 5 years ago

Might be easier if we use containers and/or VM's for test/developing it I think. The docs are great already for development, I could set up Annif quite quickly. But nothing beats being able to run one command (and also being able to quickly have access to all commands used for the installation)