NatLibFi / Annif

Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.
https://annif.org
Other
196 stars 41 forks source link

Data Set Management in Annif #635

Open mo-fu opened 1 year ago

mo-fu commented 1 year ago

When extending Annif with more hyperparamter optimization functionality or training via API it may be useful to have data set management.

Possible functionalities:

osma commented 1 year ago

Can you provide an example how this could look from the user perspective? For example CLI commands or REST API calls?

mo-fu commented 1 year ago

I added some examples on CLI usage.

osma commented 1 year ago

Ah, now I understand what you mean by this, thanks!

How would this be implemented? Where would the managed data sets be stored? Somewhere under the data directory? Would these be copies of the originals or something else?

This would expand the scope of Annif quite a lot. I'm not sure it would be worth the additional complexity. But it's an interesting idea.