Improve model performance

oliviadargel commented 2 years ago

User story

As a NLP component developer
I need to improve the performance of my NER model
So that I can achieve better results

Acceptance criteria

number of iterations for model training is optimized
more sentences are added to the data set

Definition of done

Code was reviewed by another person
GitHub CI runs successfully
Feature is merged into main branch

oliviadargel commented 2 years ago

more sentences are added to the data set

This will be done through #189

oliviadargel commented 2 years ago

The results are not as expected, because we achieve really high F1-scores (and precision) even with a few iterations, which means in general that the model either overfits or we don't provide enough test data. On the other hand, the performance behavior of the different labels is very similar, which makes the decision easier since there is no need to decide for which label performance must be better than another.

This means, more variance has to be added to the training data set (and to the test data set, too), maybe Cross Validation should be considered. Without further big changes on the datasets, a small number of iterations (>10) should be used, to overfit the model as little as possible.

The results can be found in src/nlp/src/visualization as well as the code, that differed from main while testing the performance.

amosproj / amos2021ws01-geo-data-search