NatLibFi / Annif

Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.
https://annif.org
Other
195 stars 41 forks source link

Batch suggest in Omikuji backend #669

Closed osma closed 1 year ago

osma commented 1 year ago

This PR clarifies that backends only have to implement either one of _suggest and _suggest_batch, then implements batched suggest in the Omikuji backend. In practice, only the text vectorization is performed on the whole batch at once; the Omikuji implementation only supports a predict method for a single document at a time so it has to be done within a for loop.

There seems to be a small performance benefit. I tested this using annif eval the Finto AI yso-parabel-fi project/model, with the kirjaesittelyt2021/fin/test corpus. The evaluation results were unchanged, only the amount of time spent was slightly different. Memory usage remained pretty much the same.

With 1 job

user time wall time max rss
before (master) 86.69 1:29.94 6322624
after (PR) 78.66 1:22.79 6336584

With 4 jobs

user time wall time max rss
before (master) 121.33 1:22.33 6293804
after (PR) 96.55 1:18.78 6293640

Fixes #665

sonarcloud[bot] commented 1 year ago

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

codecov[bot] commented 1 year ago

Codecov Report

Base: 99.56% // Head: 99.56% // Increases project coverage by +0.00% :tada:

Coverage data is based on head (f4b55cd) compared to base (a7e3b4b). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #669 +/- ## ======================================= Coverage 99.56% 99.56% ======================================= Files 87 87 Lines 6143 6145 +2 ======================================= + Hits 6116 6118 +2 Misses 27 27 ``` | [Impacted Files](https://codecov.io/gh/NatLibFi/Annif/pull/669?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi) | Coverage Δ | | |---|---|---| | [annif/backend/backend.py](https://codecov.io/gh/NatLibFi/Annif/pull/669?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi#diff-YW5uaWYvYmFja2VuZC9iYWNrZW5kLnB5) | `100.00% <ø> (ø)` | | | [annif/backend/omikuji.py](https://codecov.io/gh/NatLibFi/Annif/pull/669?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi#diff-YW5uaWYvYmFja2VuZC9vbWlrdWppLnB5) | `97.53% <100.00%> (+0.09%)` | :arrow_up: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.