Closed frreiss closed 3 years ago
Hi @frreiss ,
would it be possible to get a kind of "pre-access" to the corrected error lists and scripts :thinking:
I would really like to run experiments with Flair on the corrected version :hugs:
Thanks in advance :heart:
Stefan
Thanks for your interest, @stefan-it ! The repository with the list of corrections should go live tomorrow. It will be at https://github.com/CODAIT/Identifying-Incorrect-Labels-In-CoNLL-2003
@stefan-it the repository at https://github.com/CODAIT/Identifying-Incorrect-Labels-In-CoNLL-2003 is now live.
Note that we are working on some additional cleanup and will tag a second release soon, so you may want to wait a day or two.
:+1: thanks for that :hugs:
I currently see some label mismatches:
0
B-LOC
B-MISC
B-ORG
I-LOC
I-LOC.
I-LOCMinn
I-MISC
I-MISC.
I-MISC12
I-MISCBAY
I-MISCCUP
I-MISCdiplomats
I-MISCFOOTBALL-RANDALL
I-MISCleader
I-MISCLouis-based
I-MISCMAKE
I-MISCopen
I-MISCPILOTS
I-MISCquits
I-MISCRETIRES
I-MISCRULES-AFL
I-MISCSEES
I-MISCspokesman
I-MISCSTATE
I-MISCstill
I-MISCTrade
I-MISCWINS
I-ORG
I-ORGAthens
I-ORGFe
I-ORGgiven
I-ORGv
I-PER
I-PER.
I-PP
O
O)
Orebels-Interfax
so Im really excited for the release :heart:
@stefan-it thanks for finding that regression. We are tracking down the cause.
BTW, the problem is that the output file is missing some carriage returns. As a short-term workaround, it looks like you should be able to just add a newline after each of the garbled tokens. For example, I-LOCMinn
becomes I-LOC\nMinn
.
I think we've added enough descriptive text to the tutorial to be able to close this issue.
We've checked the experiment code from our CoNLL-2020 paper under
tutorials/corpus
, as a collection of 4 notebooks. Our intent is to turn these notebooks into a detailed tutorial on analyzing model outputs and corpus labels using Text Extensions for Pandas.To complete the tutorial, we need to add explanatory text to the notebooks by adding Markdown cells in between the current set of code cells.
This issue covers the task of adding this explanatory text. We anticipate that there will be several pull requests associated with this issue, as there is quite a bit of code to document.