Right now all models are built once and then remain static. But at least for tfidf and pav backends it would be possible to improve the model over time based on user feedback, i.e. documents that were analyzed, a human reviewed the result and rejected some subjects (false positives) and/or added new ones (false negatives).
This issue is about providing the infrastructure for such functionality and the CLI commands and REST API methods. I will open separate issues for adding online learning support for each backend.
The CLI command could be
annif learn <projectid> <corpus>
(still thinking about what would be the best command name, maybe also increment[al] or retrain)learn is fine, used by e.g. Vowpal Wabbit
The REST API method could be (for a single command)
POST /projects/<projectid>/learn
with parameters corresponding to the CLI command.
Ensemble style backends (currently ensemble and pav) should propagate the learn operation first to the source projects, reanalyze the document, and only then update their own model.
Right now all models are built once and then remain static. But at least for
tfidf
andpav
backends it would be possible to improve the model over time based on user feedback, i.e. documents that were analyzed, a human reviewed the result and rejected some subjects (false positives) and/or added new ones (false negatives).This issue is about providing the infrastructure for such functionality and the CLI commands and REST API methods. I will open separate issues for adding online learning support for each backend.
The CLI command could be
(still thinking about what would be the best command name, maybe alsolearn is fine, used by e.g. Vowpal Wabbitincrement[al]
orretrain
)The REST API method could be (for a single command)
with parameters corresponding to the CLI command.
Ensemble style backends (currently
ensemble
andpav
) should propagate the learn operation first to the source projects, reanalyze the document, and only then update their own model.