NatLibFi / Annif-tutorial

Instructions, exercises and example data sets for Annif hands-on tutorial
Creative Commons Attribution 4.0 International
36 stars 9 forks source link

Exercise about sufficient amount of train data (learning curves) #15

Open juhoinkinen opened 2 years ago

juhoinkinen commented 2 years ago

A common question in the tutorial sessions has been "how many documents do I need for training a model". We could have an optional exercise that would show how increasing --docs-limit value in training a model affects the evaluation results of the model. Also some simple way to plot the results as a learning curve would be nice.

osma commented 2 years ago

As a first step I added an extra section to the MLLM exercise: https://github.com/NatLibFi/Annif-tutorial/blob/master/exercises/05_mllm_project.md#extra-experiment-with-different-amounts-of-training-data