In the notebook notebooks/Analyze_Model_Outputs.ipynb (see here), some of the terminology used may be unfamiliar to a newcomer to NLP. In particular, this paragraph could use a gentler introduction to the concepts of named entity recognition and token-level error rate:
IOB2 format is a convenient way to represent a corpus, but it is a less useful representation for analyzing the result quality of named entity recognition models. Most tokens in a typical NER corpus will be tagged O, any measure of error rate in terms of tokens will over-emphasizing the tokens that are part of entities. Token-level error rate implicitly assigns higher weight to named entity mentions that consist of multiple tokens, further unbalancing error metrics. And most crucially, a naive comparison of IOB tags can result in marking an incorrect answer as correct. Consider a case where the correct sequence of labels is B, B, I but the model has output B, I, I; in this case, last two tokens of model output are both incorrect (the model has assigned them to the same entity as the first token), but a naive token-level comparison will consider the last token to be correct.
We should add more Markdown text to this notebook in two places:
At the beginning, there should be a more detailed explanation of named entity recognition models, ideally with a visual illustration of NER model outputs (perhaps drawn by some Python code using displaCy).
The above paragraph should be expanded out with a more detailed explanation of what happens when you use token classification (instead of entity extraction) as the basis for computing model quality.
In the notebook
notebooks/Analyze_Model_Outputs.ipynb
(see here), some of the terminology used may be unfamiliar to a newcomer to NLP. In particular, this paragraph could use a gentler introduction to the concepts of named entity recognition and token-level error rate:We should add more Markdown text to this notebook in two places: