Open mo-fu opened 1 year ago
Can you provide an example how this could look from the user perspective? For example CLI commands or REST API calls?
I added some examples on CLI usage.
Ah, now I understand what you mean by this, thanks!
How would this be implemented? Where would the managed data sets be stored? Somewhere under the data
directory? Would these be copies of the originals or something else?
This would expand the scope of Annif quite a lot. I'm not sure it would be worth the additional complexity. But it's an interesting idea.
When extending Annif with more hyperparamter optimization functionality or training via API it may be useful to have data set management.
Possible functionalities:
annif data add $DATA_SET_NAME $PATH
annif data split $DATE_SET_NAME 0.7:train 0.2:test 0.1:validate
, could be adressed usingannif train ${DATA_SET_NAME}#train
annif data remove $DATA_SET_NAME