Function for creating a DatasetWithEntities from sentences and ground truth.
Add token-based evaluation
Use span-based evaluation or token-based evaluation
span-based
Span-based evaluation (by default) consider each BIO tag for each token, and it's only correct if all the tokens in the entity span are recognized with their corresponding BIO tag.
token-based
Token-based evaluation consider only the B- tag for each token, and it's measured at token level.
In the next example, the SMXM Linker will extract York as LOC, but not New New York. Thus, the span-based f1 is 0.0 as the span was not fully recognized. On the other hand, the token-based f1 is 0.5, as the precision is 1.0 and the recall is 0.3333 (1 token correctly extracted out of three).
Summary
Added
create_dataset
:Add token-based evaluation
span-based Span-based evaluation (by default) consider each BIO tag for each token, and it's only correct if all the tokens in the entity span are recognized with their corresponding BIO tag.
token-based Token-based evaluation consider only the B- tag for each token, and it's measured at token level.
In the next example, the SMXM Linker will extract
York
asLOC
, but notNew New York
. Thus, the span-based f1 is 0.0 as the span was not fully recognized. On the other hand, the token-based f1 is 0.5, as the precision is 1.0 and the recall is 0.3333 (1 token correctly extracted out of three).Example