aichaos / rivescript-python

A RiveScript interpreter for Python. RiveScript is a scripting language for chatterbots.
https://www.rivescript.com
MIT License
157 stars 71 forks source link

Revisit regular expression substitutions for Python 3.6 #29

Open kirsle opened 8 years ago

kirsle commented 8 years ago

Python 3.6 is creating a breaking change to the way re.sub() works, in that any unknown escape sequence in the repl parameter (a backslash followed by any ASCII letter) becomes a runtime exception.

This commit fixes the particular case that was causing the unit tests in Travis CI to fail under Python 3.6.0a2 (nightly build).

In normal (non-UTF8) operation of RiveScript it should be difficult to trigger a similar exception, but with UTF-8 mode enabled and specially crafted RiveScript source files it could become possible to crash the library.

Every instance of re.sub() should be visited to make sure that no escape sequence will ever appear as the repl parameter that will cause a runtime crash of the program. Even if the user disables strict mode.

See also: https://docs.python.org/3.6/library/re.html#re.sub

Changed in version 3.6: Unknown escapes consisting of '\' and an ASCII letter now are errors.

Python 3.6 is set for release on December 12, 2016 and RiveScript should be ready for it before then.

kirsle commented 8 years ago

Possible problematic cases:

In _reply_regexp near line 2299 when handling optionals in triggers:

            regexp = re.sub(r'\s*\[' + re.escape(match) + '\]\s*',
                '(?:' + pipes + r'|(?:\\s|\\b))', regexp)

If the contents of pipes can contain any special characters this could break. Interestingly, the \\b doesn't raise an exception, and according to the new (3.6) docs, \\s should get expanded into a literal space character which probably isn't what we want either (tabs could break it).

Further down near line 2311 when it handles arrays:

regexp = re.sub(r'\@' + re.escape(array) + r'\b', rep, regexp)

To be safe we could replace this with a standard string replacement.

Possible things to test for breakage with optionals in triggers:

All the other instances of re.sub replace a pattern with an empty string or a hard-coded string that contains no escape sequences.