IBM / zshot

Zero and Few shot named entity & relationships recognition
https://ibm.github.io/zshot
MIT License
350 stars 20 forks source link

Improve evaluation #62

Closed marmg closed 1 year ago

marmg commented 1 year ago
Status Type ⚠️ Core Change Issue
Ready Feature No

Summary

span-based     Span-based evaluation (by default) consider each BIO tag for each token, and it's only correct if all the tokens in the entity span are recognized with their corresponding BIO tag.

token-based     Token-based evaluation consider only the B- tag for each token, and it's measured at token level.

In the next example, the SMXM Linker will extract York as LOC, but not New New York. Thus, the span-based f1 is 0.0 as the span was not fully recognized. On the other hand, the token-based f1 is 0.5, as the precision is 1.0 and the recall is 0.3333 (1 token correctly extracted out of three).

Example

import spacy
from zshot import PipelineConfig
from zshot.linker import LinkerSMXM

from zshot.evaluation.metrics.seqeval.seqeval import Seqeval
from zshot.evaluation.dataset.dataset import create_dataset
from zshot.evaluation.zshot_evaluate import evaluate, prettify_evaluate_report
from zshot.utils.data_models import Entity

ENTITIES = [
    Entity(name="FAC", description="A facility"),
    Entity(name="LOC", description="A location"),
]
sentences = ["New New York is beautiful"]
gt = [["B-LOC", "I-LOC", "I-LOC", "O", "O"]]

dataset = create_dataset(gt, sentences, ENTITIES)

nlp = spacy.blank("en")
nlp_config = PipelineConfig(
    linker=LinkerSMXM(),
    entities=ENTITIES
)

nlp.add_pipe("zshot", config=nlp_config, last=True)

evaluation = evaluate(nlp, dataset, metric=Seqeval())
tables = prettify_evaluate_report(evaluation, show_full_report=False)
print("F1-Macro span-based:", evaluation['linker']['overall_f1_macro'])

evaluation = evaluate(nlp, dataset, metric=Seqeval(), mode='token')
tables = prettify_evaluate_report(evaluation, show_full_report=False)
print("F1-Macro token-based:", evaluation['linker']['overall_f1_macro'])

> Map:   0%|          | 0/1 [00:00<?, ? examples/s]
> F1-Macro span-based: 0.0
> Map:   0%|          | 0/1 [00:00<?, ? examples/s]
> F1-Macro token-based: 0.5