NatLibFi / Annif

Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.
https://annif.org
Other
204 stars 41 forks source link

Incremental / online learning based on user feedback #225

Closed osma closed 5 years ago

osma commented 5 years ago

Right now all models are built once and then remain static. But at least for tfidf and pav backends it would be possible to improve the model over time based on user feedback, i.e. documents that were analyzed, a human reviewed the result and rejected some subjects (false positives) and/or added new ones (false negatives).

This issue is about providing the infrastructure for such functionality and the CLI commands and REST API methods. I will open separate issues for adding online learning support for each backend.

The CLI command could be

annif learn <projectid> <corpus>

(still thinking about what would be the best command name, maybe also increment[al] or retrain) learn is fine, used by e.g. Vowpal Wabbit

The REST API method could be (for a single command)

POST /projects/<projectid>/learn

with parameters corresponding to the CLI command.

Ensemble style backends (currently ensemble and pav) should propagate the learn operation first to the source projects, reanalyze the document, and only then update their own model.

osma commented 5 years ago

Implemented in #257