Closed frreiss closed 4 years ago
@frreiss The issue seems to be that some "Missing" corrections put the span to the "corpus_span" column but some put the span to the "correct_span" fields. The script currently uses the "corpus_span" column. Do you want me to use whichever one is present in the column? (This seems hacky though)
Seems like overwhelmingly "correct_span" is used. I'll change that
In document 42 of the "dev" fold, the lines:
have the following three corrections applied to them:
After these corrections, the lines should be tagged as follows:
Instead,
download_and_correct_corpus.py
produces this output:The tokens that should be tagged
I-ORG
as a result of the two "Missing" type corrections are instead tagged "O".