NatLibFi / Annif

Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.
https://annif.org
Other
195 stars 41 forks source link

Batch processing in training of NN ensemble - base project suggest calls #676

Open juhoinkinen opened 1 year ago

juhoinkinen commented 1 year ago

This PR experiments with implementing batched suggest calls for the base projects in NN ensemble backend.

Unfortunately there is no notable performance gain in real use, at least with MLLM, fastText, and Omikuji base projects (as in YSO projects of Finto AI), but actually a performance regression. Performance gain is seen when using only Omikuji as the base project, which is the only one of the backends in Finto AI YSO base models having the batch suggest method implemented.

Below results are from for runs at kj-kk using 16 jobs training on corpora/fulltext-train/fi/*/.

MLLM, fastText, and Omikuji base projects

1000 docs, 1 epoch

user time wall time max rss
before (master) 1268.63 2:25.07 14863072
after (PR) 1260.37 2:24.00 14791464

2000 docs, 10 epochs

user time wall time max rss
before (master) 4205.64 4:41.01 15718876
after (PR) 4152.09 4:58.89 15712064

Omikuji base project only

2000 docs, 1 epoch

user time wall time max rss
before (master) 511.38 1:23.13 7827672
after (PR) 336.72 1:13.68 7612384
codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 100.00% and no project coverage change

Comparison is base (f280342) 99.57% compared to head (38c6784) 99.57%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #676 +/- ## ======================================= Coverage 99.57% 99.57% ======================================= Files 87 87 Lines 6157 6164 +7 ======================================= + Hits 6131 6138 +7 Misses 26 26 ``` | [Impacted Files](https://codecov.io/gh/NatLibFi/Annif/pull/676?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi) | Coverage Δ | | |---|---|---| | [annif/parallel.py](https://codecov.io/gh/NatLibFi/Annif/pull/676?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi#diff-YW5uaWYvcGFyYWxsZWwucHk=) | `100.00% <ø> (ø)` | | | [annif/backend/nn\_ensemble.py](https://codecov.io/gh/NatLibFi/Annif/pull/676?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi#diff-YW5uaWYvYmFja2VuZC9ubl9lbnNlbWJsZS5weQ==) | `100.00% <100.00%> (ø)` | | | [tests/test\_backend\_nn\_ensemble.py](https://codecov.io/gh/NatLibFi/Annif/pull/676?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi#diff-dGVzdHMvdGVzdF9iYWNrZW5kX25uX2Vuc2VtYmxlLnB5) | `100.00% <100.00%> (ø)` | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

sonarcloud[bot] commented 1 year ago

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication