How to disable seqeval label formatting for POS-tagging

WilliamAboucaya commented 2 years ago

I am trying to evaluate my POS-tagger using seqeval but, since my tags are not made for NER, they are not formatted the way the library expects them. Consequently, when I try to read the results of my classification report, the labels for class-specific results consistently lack the first character (the last if I pass suffix=True).

Is there a way to disable entity recognition in labels or do I have to pass all my labels with a starting space to solve this issue?

How to reproduce the behaviour

SSCCE:

from seqeval.metrics import accuracy_score
from seqeval.metrics import classification_report
from seqeval.metrics import f1_score

y_true = [['INT', 'PRO', 'PRO', 'VER:pres'], ['ADV', 'PRP', 'PRP', 'ADV']]
y_pred = [['INT', 'PRO', 'PRO', 'VER:pres'], ['ADV', 'PRP', 'PRP', 'ADV']]

print(classification_report(y_true, y_pred))

Output:

	precision	recall	f1-score	support
DV	1.00	1.00	1.00	2
ER:pres	1.00	1.00	1.00	1
NT	1.00	1.00	1.00	1
RO	1.00	1.00	1.00	1
RP	1.00	1.00	1.00	1
micro avg	1.00	1.00	1.00	6
macro avg	1.00	1.00	1.00	6
weighted avg	1.00	1.00	1.00	6

Your Environment

Operating System: Windows 10
Python Version: 3.9.6
Package Version: 1.2.2

mattdeeperinsights commented 2 years ago

You can just convert all of your tokens to a begin type:

def convert_to_b(y):
    """Append "B-" to start of all tags"""
    return [
        [f'B-{tag}' for tag in tags]
        for tags in y
    ]

y_true = [['INT', 'PRO', 'PRO', 'VER:pres'], ['ADV', 'PRP', 'PRP', 'ADV']]
y_pred = [['INT', 'PRO', 'PRO', 'VER:pres'], ['ADV', 'PRP', 'PRP', 'ADV']]

y_true = convert_to_b(y_true)
y_pred = convert_to_b(y_pred)

print(classification_report(y_true, y_pred))

WilliamAboucaya commented 2 years ago

Thanks, I was hoping there was a built-in feature but this solution seems to work!

chakki-works / seqeval