chakki-works / seqeval

A Python framework for sequence labeling evaluation(named-entity recognition, pos tagging, etc...)
MIT License
1.09k stars 129 forks source link

How to disable seqeval label formatting for POS-tagging #88

Closed WilliamAboucaya closed 2 years ago

WilliamAboucaya commented 2 years ago

I am trying to evaluate my POS-tagger using seqeval but, since my tags are not made for NER, they are not formatted the way the library expects them. Consequently, when I try to read the results of my classification report, the labels for class-specific results consistently lack the first character (the last if I pass suffix=True).

Is there a way to disable entity recognition in labels or do I have to pass all my labels with a starting space to solve this issue?

How to reproduce the behaviour

SSCCE:

from seqeval.metrics import accuracy_score
from seqeval.metrics import classification_report
from seqeval.metrics import f1_score

y_true = [['INT', 'PRO', 'PRO', 'VER:pres'], ['ADV', 'PRP', 'PRP', 'ADV']]
y_pred = [['INT', 'PRO', 'PRO', 'VER:pres'], ['ADV', 'PRP', 'PRP', 'ADV']]

print(classification_report(y_true, y_pred))

Output:

precision recall f1-score support
DV 1.00 1.00 1.00 2
ER:pres 1.00 1.00 1.00 1
NT 1.00 1.00 1.00 1
RO 1.00 1.00 1.00 1
RP 1.00 1.00 1.00 1
micro avg 1.00 1.00 1.00 6
macro avg 1.00 1.00 1.00 6
weighted avg 1.00 1.00 1.00 6

Your Environment

mattdeeperinsights commented 2 years ago

You can just convert all of your tokens to a begin type:

def convert_to_b(y):
    """Append "B-" to start of all tags"""
    return [
        [f'B-{tag}' for tag in tags]
        for tags in y
    ]

y_true = [['INT', 'PRO', 'PRO', 'VER:pres'], ['ADV', 'PRP', 'PRP', 'ADV']]
y_pred = [['INT', 'PRO', 'PRO', 'VER:pres'], ['ADV', 'PRP', 'PRP', 'ADV']]

y_true = convert_to_b(y_true)
y_pred = convert_to_b(y_pred)

print(classification_report(y_true, y_pred))
WilliamAboucaya commented 2 years ago

Thanks, I was hoping there was a built-in feature but this solution seems to work!