NatLibFi / Annif

Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.
https://annif.org
Other
191 stars 41 forks source link

optimization: load a vocabulary only once even if used in different languages #736

Closed osma closed 9 months ago

osma commented 10 months ago

While looking at ways to implement #735, I discovered an opportunity for optimization in the registry code that handles loading of vocabularies. For some reason (probably my mistake) the registry loads vocabularies multiple times, once per language. This amounts to useless work and use of memory.

This PR adjusts the code slightly so that vocabularies are always loaded just once. This was always the intention since the introduction of multilingual vocabularies (#559, PR #600 etc.) and especially PR #610 which implemented vocabularies that are shared between projects.

I benchmarked this with an installation where I have three Finto AI MLLM projects (languages fi, sv, en) that all use the YSO vocabulary, but in different languages. I ran the command

ANNIF_CONFIG=annif.default_config.ProductionConfig /usr/bin/time -v annif list-projects

The idea here is to use ProductionConfig which causes all projects to be loaded on startup, instead of on demand. This means that also the vocabulary is loaded.

Before

(showing selected stats)

        User time (seconds): 13.04
    System time (seconds): 6.13
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:12.27
    Maximum resident set size (kbytes): 539600

After

    User time (seconds): 12.82
    System time (seconds): 7.26
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:11.66
    Maximum resident set size (kbytes): 428940

So there's a slight speedup, and the memory usage drops by 110MB. Not bad for a patch that also reduces the amount of code by 3 lines.

sonarcloud[bot] commented 10 months ago

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication