Fix regex warnings - Githubissues

forslund commented 3 years ago

When testing on a new computer I received a series of deprecation warnings similar to

adapt/tools/text/tokenizer.py:58
  /home/ake/projects/python/adapt/adapt/tools/text/tokenizer.py:58: DeprecationWarning: invalid escape sequence \g
    s = re.sub("(" + regex_separator + ")", " \g<1> ", s)

This PR converts the offending strings (as well as some related strings for consistency) to raw string literals.

forslund commented 3 years ago

Should I go ahead and add some tokenizer tests or is this well covered by the other tests do you think?

clusterfudge commented 3 years ago

That's a good question, I haven't thought about the coverage of the tokenizer much in the last several years; it's a port of a java tokenizer that I've long since lost track of, and I'm not even sure what we should be testing.

As for "is there enough coverage", we don't appear to have any explicit tests for the tokenizer :(

I think this PR is probably fine (fixes linter warnings), and if the existing tests pass we can leave it. We should also open an issue to track documentation of and test coverage for the concept of the tokenizer (and define an interface?), and then tests explicitly for the english version. This has been one of the big asks for localization, as I've said "we need a german tokenizer", but haven't explained how it should work.

clusterfudge commented 3 years ago

Let's also see if we can get a linter hooked up to the build and fail on warning; that should help us keep this cleaner in the short term.

MycroftAI / adapt

Fix regex warnings #123