Closed wblacoe closed 5 years ago
Thanks for reporting this - I was able to reproduce your results. I'll take a closer look at what's happening here and report back!
Hello @wblacoe the model was somehow not working, but when I trained a new one with the current version I got good results. I think the germeval model was very old and trained many Flair versions ago so it's possible that changes since then have somehow impacted its accuracy. I've been planning for a while now to retrain everything for the new version and will do this at the latest for the next major release.
In the meantime, I've just pushed a PR that updates the germeval model. The new model now reaches an F1 score of 84.85 using the standard learning parameters. I did not do any experimentation or try any of the new embeddings, so better results are likely possible.
If you want to try the model you can either update your Flair version to the current master, or download the model from here and load it in the SequenceTagger
. Let me know if this works!
Thanks a bunch for checking this @alanakbik! Yes, it works for me too now:
MICRO_AVG: acc 0.7368 - f1-score 0.8485
MACRO_AVG: acc 0.4965 - f1-score 0.6249833333333333
LOC tp: 1554 - fp: 192 - fn: 152 - tn: 1554 - precision: 0.8900 - recall: 0.9109 - accuracy: 0.8188 - f1-score: 0.9003
LOCderiv tp: 525 - fp: 65 - fn: 36 - tn: 525 - precision: 0.8898 - recall: 0.9358 - accuracy: 0.8387 - f1-score: 0.9122
LOCpart tp: 60 - fp: 6 - fn: 49 - tn: 60 - precision: 0.9091 - recall: 0.5505 - accuracy: 0.5217 - f1-score: 0.6857
ORG tp: 884 - fp: 196 - fn: 266 - tn: 884 - precision: 0.8185 - recall: 0.7687 - accuracy: 0.6568 - f1-score: 0.7928
ORGderiv tp: 2 - fp: 1 - fn: 6 - tn: 2 - precision: 0.6667 - recall: 0.2500 - accuracy: 0.2222 - f1-score: 0.3636
ORGpart tp: 107 - fp: 44 - fn: 65 - tn: 107 - precision: 0.7086 - recall: 0.6221 - accuracy: 0.4954 - f1-score: 0.6625
OTH tp: 444 - fp: 134 - fn: 253 - tn: 444 - precision: 0.7682 - recall: 0.6370 - accuracy: 0.5343 - f1-score: 0.6965
OTHderiv tp: 23 - fp: 13 - fn: 16 - tn: 23 - precision: 0.6389 - recall: 0.5897 - accuracy: 0.4423 - f1-score: 0.6133
OTHpart tp: 11 - fp: 4 - fn: 31 - tn: 11 - precision: 0.7333 - recall: 0.2619 - accuracy: 0.2391 - f1-score: 0.3860
PER tp: 1528 - fp: 140 - fn: 111 - tn: 1528 - precision: 0.9161 - recall: 0.9323 - accuracy: 0.8589 - f1-score: 0.9241
PERderiv tp: 3 - fp: 4 - fn: 8 - tn: 3 - precision: 0.4286 - recall: 0.2727 - accuracy: 0.2000 - f1-score: 0.3333
PERpart tp: 7 - fp: 10 - fn: 37 - tn: 7 - precision: 0.4118 - recall: 0.1591 - accuracy: 0.1296 - f1-score: 0.2295
Is the metric evalaution script "strict" for the GermEval14 set?
For instance: LOCderiv
is predicted as LOC
. Will this be considered as a (truely) false prediciton?
@alanakbik
Yes it does exact matching so it would take a strict interpretation and count this as false.
Hi. I can't reproduce your F1 score of 84.65 for GermEval 2014. I used everything out of the box:
But my results are these:
Did you use some other F1-measure than micro-averaged? Did you use a different set of classes? Thanks