Closed xuhdev closed 3 years ago
@xuhdev this is the cause of https://github.com/CODAIT/text-extensions-for-pandas/issues/148#issuecomment-730224771?
0
B-LOC
B-MISC
B-ORG
I-LOC
I-LOC.
I-LOCMinn
I-MISC
I-MISC.
I-MISC12
I-MISCBAY
I-MISCCUP
I-MISCdiplomats
I-MISCFOOTBALL-RANDALL
...
@BryanCutler Yes
Maybe I'm missing something here. If this change goes through, won't the download_and_correct_corpus.py
script generate an version of the corrected corpus with zero token corrections applied?
@frreiss We had token corrections manually applied, I believe. @kmh4321
Eventually we have to figure out what went wrong in the token corrections code.
@xuhdev if we were to switch to applying token corrections manually, we would need to provide users with detailed instructions on how to apply token corrections manually. I don't think users would appreciate that.
We need to fix the bug in the automated correction code.
Closed in favor of the real fix #33
introduced in e740f09b0549151fce19d13f5f00e6e58d52ccb7
Relevant comment: https://github.com/CODAIT/text-extensions-for-pandas/issues/148#issuecomment-730224771