CODAIT / Identifying-Incorrect-Labels-In-CoNLL-2003

Research into identifying and correcting incorrect labels in the CoNLL-2003 corpus.
Apache License 2.0
12 stars 2 forks source link

Update sentence boundaries file #15

Closed frreiss closed 3 years ago

frreiss commented 4 years ago

After submitting the camera-ready, we found some minor omissions in the file of sentence boundary corrections that we used for the paper.

Once we've finished cleaning up a version of the corrected data set to go with the paper, we should update the sentence boundary corrections file with some additional corrections.

The script to regenerate the sentence corrections is scripts/sentence_correction_preprocessing.ipynb.

frreiss commented 3 years ago

36 should cover this.