amosproj / amos2021ws01-geo-data-search

Natural language and buzzword search on routing and places
MIT License
3 stars 1 forks source link

Improve model performance #181

Closed oliviadargel closed 2 years ago

oliviadargel commented 2 years ago

User story

  1. As a NLP component developer
  2. I need to improve the performance of my NER model
  3. So that I can achieve better results

Acceptance criteria

Definition of done

oliviadargel commented 2 years ago

more sentences are added to the data set

This will be done through #189

oliviadargel commented 2 years ago

The results are not as expected, because we achieve really high F1-scores (and precision) even with a few iterations, which means in general that the model either overfits or we don't provide enough test data. On the other hand, the performance behavior of the different labels is very similar, which makes the decision easier since there is no need to decide for which label performance must be better than another.

This means, more variance has to be added to the training data set (and to the test data set, too), maybe Cross Validation should be considered. Without further big changes on the datasets, a small number of iterations (>10) should be used, to overfit the model as little as possible.

The results can be found in src/nlp/src/visualization as well as the code, that differed from main while testing the performance.