NatLibFi / Annif

Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.
https://annif.org
Other
195 stars 41 forks source link

Use multiprocessing for hyperopt operation #688

Open osma opened 1 year ago

osma commented 1 year ago

As noted in the "Potential future work" section of PR #681:

The threading performance of annif hyperopt was already bad, and now it got worse. The solution could be to switch to process-based multiprocessing. This has been difficult to do with Optuna (needs an external relational database), but the Optuna FAQ now states that it could also be done with JournalFileStorage, which sounds more promising.

So we should investigate whether it would be possible to use multiprocessing in hyperopt operations, because the current multithreading approach doesn't actually work very well.

osma commented 1 year ago

Here is a blog post with more details about JournalFileStorage.

Using a journal file could perhaps make it possible to extend hyperparameter optimization runs, as suggested in #633.