NatLibFi / Annif

Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.
https://annif.org
Other
195 stars 41 forks source link

Better support for suggestion batches in NN ensemble #687

Open osma opened 1 year ago

osma commented 1 year ago

As noted in PR #681 ("Potential future work"), the way NN ensemble handles batches could be improved:

I'm not quite happy with how the NN ensemble handles suggestion results from other projects, both during training and suggest operations. For example, the training samples are stored in LMDB one document at a time, but now it would be easier to store them as whole batches instead, which could be more efficient. But I decided that this PR is already much too big and it would make sense to try to improve batching in the NN ensemble in a separate follow-up PR. There is already an attempt to do part of this in PR https://github.com/NatLibFi/Annif/pull/676; that could be a possible starting point.

In particular:

Of course the changes need to be properly benchmarked.