NatLibFi / Annif

Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.
https://annif.org
Other
188 stars 41 forks source link

Parallel processing of training docs in MLLM #483

Closed osma closed 2 years ago

osma commented 3 years ago

Training the MLLM backend can be a bit slow. Most of the time is spent generating candidates from the training documents. This could probably be done faster by using parallel processing. It's noted as a TODO item in the code: https://github.com/NatLibFi/Annif/blob/master/annif/backend/mllm.py#L23

osma commented 2 years ago

Actually the TODO item mentioned above is only relevant for the hyperparameter optimization functionality of the MLLM backend.

The training docs are processed in this loop and it could probably be parallelized.