bootphon / wordseg

A Python toolbox for text based word segmentation
https://docs.cognitive-ml.fr/wordseg
GNU General Public License v3.0
16 stars 7 forks source link

Token evaluation: positional or count? #46

Closed mmmaat closed 6 years ago

mmmaat commented 6 years ago

Some clarification is needed regarding the “token” evaluation metric. Specifically, does it just check the expected token counts for a sentence, or does it check the position as well? For example, consider the gold segmentation “ice ice cream is icecream” and the system output “ice icecream is ice cream”. If evaluation is just over expected token counts, this is 100% correct (ice: 2, cream: 1, icecream: 1, is: 1 for both). However, if the scorer checks the position (e.g., the final word is “icecream”), the system output is not treated as correct. I would expect the “token” metric to compute the latter, but the documentation should be more explicit about how type and token performance is computed.

mmmaat commented 6 years ago

Token evaluation is positional:

>>> from wordseg.evaluate import evaluate
>>> gold = ['ice ice cream is icecream']
>>> text = ['ice icecream is ice cream']
>>> evaluate(text, gold)
OrderedDict([('token_precision', 0.4),
             ('token_recall', 0.4),
             ('token_fscore', 0.4),
             ('type_precision', 1.0),
             ('type_recall', 1.0),
             ('type_fscore', 1.0),
             ...