CODAIT / Identifying-Incorrect-Labels-In-CoNLL-2003

Research into identifying and correcting incorrect labels in the CoNLL-2003 corpus.
Apache License 2.0
12 stars 2 forks source link

Span marked incorrect in dev fold document 7 may actually be correct #12

Open frreiss opened 4 years ago

frreiss commented 4 years ago

In document 7 of the dev fold, the corpus contains the entity

[1004, 1022): 'Boxing Association' / ORG

We currently mark this as a "Span" type error, with the corrected span being

[993, 1022): 'Panamanian Boxing   Association'

This correction appears to be not correct, based on looking through Wikipedia. There does not appear to be any organization called the "Panamanian Boxing Association". The organization that document refers to looks to be the World Boxing Association. The World Boxing Association is based in Panama; see https://en.wikipedia.org/wiki/World_Boxing_Association

frreiss commented 4 years ago

There is some doubt within the team as to whether we should adjust this entry.

We'll leave this one in place for the first release of the data set at least.