NatLibFi / Annif

Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.
https://annif.org
Other
195 stars 41 forks source link

Batch suggest in ensemble backends #677

Closed juhoinkinen closed 1 year ago

juhoinkinen commented 1 year ago

This adds the _suggest_batch method to ensemble backends.

In all ensemble backends (simple, PAV, NN) the suggestions from the source projects are fetched by using batched suggest calls (this is on project level, so whether a backend actually uses batched suggest depends on the backend).

In case of NN ensemble the prediction (now via _merge_hit_sets_from_sources) is performed on the whole batch of the base suggestions in one call of the NN model. The prediction is made using the model's __call__() method instead of predict() as it is the recommended way for "small numbers of inputs that fit in one batch" and it should offer better performance; this should also fix #674.

Also the EnsembleOptimizer is made to use batched suggest. Quickly testing the hyperopt command on an ensemble project gives very similar weights and best NDCG scores before and after this PR.

This PR improves somewhat the performance of NN ensemble suggest functionality while the results remain (only nearly?) identical (I think there were small differences in suggestion scores on my laptop). I think most of the increase comes from using __call__ of the model instead of predict.

The below results are from runs at kj-kk using the current Finto AI YSO NN ensemble model (having MLLM, fastText and Omikuji base projects).

suggest

Targeting 6 times tests/corpora/archaeology/fulltext/*.txt:

user time wall time max rss
before (master) 152.59 2:35.10 13755484
after (PR) 135.19 2:17.27 13757912

eval

Targeting 200 documents from kirjaesittelyt2021/yso/fin/test:

With 1 job

user time wall time max rss F1@5
before (master) 174.79 2:57.70 13816688 0.4431
after (PR) 156.12 2:36.77 13796932 0.4431

With 4 jobs

user time wall time max rss F1@5
before (master) 188.21 1:45.61 13639148 0.4431
after (PR) 172.5 1:43.78 13606508 0.4431
codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.01 :tada:

Comparison is base (3e8f42f) 99.56% compared to head (f280342) 99.57%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #677 +/- ## ========================================== + Coverage 99.56% 99.57% +0.01% ========================================== Files 87 87 Lines 6158 6157 -1 ========================================== Hits 6131 6131 + Misses 27 26 -1 ``` | [Impacted Files](https://codecov.io/gh/NatLibFi/Annif/pull/677?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi) | Coverage Δ | | |---|---|---| | [annif/backend/ensemble.py](https://codecov.io/gh/NatLibFi/Annif/pull/677?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi#diff-YW5uaWYvYmFja2VuZC9lbnNlbWJsZS5weQ==) | `100.00% <100.00%> (ø)` | | | [annif/backend/nn\_ensemble.py](https://codecov.io/gh/NatLibFi/Annif/pull/677?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi#diff-YW5uaWYvYmFja2VuZC9ubl9lbnNlbWJsZS5weQ==) | `100.00% <100.00%> (+0.70%)` | :arrow_up: | | [annif/suggestion.py](https://codecov.io/gh/NatLibFi/Annif/pull/677?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi#diff-YW5uaWYvc3VnZ2VzdGlvbi5weQ==) | `100.00% <100.00%> (ø)` | | | [annif/util.py](https://codecov.io/gh/NatLibFi/Annif/pull/677?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi#diff-YW5uaWYvdXRpbC5weQ==) | `98.57% <100.00%> (ø)` | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

sonarcloud[bot] commented 1 year ago

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

juhoinkinen commented 1 year ago

I rerun the NN ensemble suggest and eval timings and updated the results table after the last change: it seemed to give a few seconds improvement more.