Closed 646e62 closed 1 year ago
The solution we discussed this morning worked, but at the price of removing the word from the sentence entirely. Instead, I implemented a solution that just removes the period at the end of the abbreviated words. It seems to be working okay so far, and likely will going forward, as none of these abbreviations tend to be at the end of sentences.
SpaCy treats some commonly abbreviated terms in written decisions (eg, "para." for "paragraph", "Cst." for "Constable", etc) as sentence breaks. This both adds time onto manual data cleaning and makes getting input more prohibitive. The data cleaning functions need to be updated to correct these problems so that text files can be automatically and reliably loaded into the program.