Closed benfoley closed 4 years ago
The latest commit #08a6ffb adds a condition to only try punctuation match regex if there's a pattern to build from a string of punctuation marks. Seems that if that string is empty the pattern is empty and so the match fails. Wee seem to have lost the default set of punctuation to strip. It might have to be declared in the UI part of the engine selector. Will look at that separately. This PR will at least prevent it from breaking.
This is failing because the punctuation isn't being cleaned. I'm about to submit another PR that deals with the that.
Adding extra corpus text files in the dataset stage broke Elpis, with an error
sre_constants.error: unterminated character set at position 0
, from:This commit only attempts the re.sub if there is a string to build the match pattern from.