Closed alanakbik closed 3 years ago
Thanks for catching those problems in the first document of the test fold! I must confess that I went over that document by hand at least twice and managed to miss those additional corrections :-\
The instance of CHINA
in the title being tagged LOC
was due to there being conflicting manual corrections. The original tag for that token was PER
. One file corrected it to LOC
, and another corrected it to ORG
. I've changed the audited files so that both change that tag to ORG
.
The remaining corrections just weren't caught by our ensembles of models, and I guess our eyes skated over them while we were looking at the problems that were flagged by the models. Go figure.
I've fixed all of these in #39.
Here are the corrections for the first document in the test fold BEFORE those additional changes:
And here is the same file AFTER, with new corrections in red and the single modified tag in blue:
Please let us know if you see anything additional problems we missed!
Fixed in #39
@frreiss awesome, thanks for correcting this!
Again thanks for sharing this work. The annotations look much improved over the original CoNLL. Especially sports teams that originally were not well labeled are now much better.
However, is seems that some sports teams in the test split are still labeled as LOC, but I think they should be ORG. For example, in the first few sentences we see a couple of instances like this:
In these examples "CHINA", "Uzbekistan" and "Japan" should be ORG (the other teams "JAPAN" and "China" in this example are labeled as ORG).