SFUAnime / Ren

As a mascot for the SFU Anime Club, Ren is our in-house Discord bot. Includes first/third party modules that are in use.
GNU General Public License v3.0
19 stars 21 forks source link

[WordFilter] Better validate and handle regex #646

Open Injabie3 opened 1 year ago

Injabie3 commented 1 year ago

In other cogs that hook into WordFilter, we hit a multiple repeat error on the following regex string: bl[o0]*{0,3}w j[o0]*{0,3}bs. Of particular interest is the * operator, which was what it was complaining about, because {0,3} is trying to operate on * which doesn't make sense.

We should find a way to better handle this on insertion and within the _filterWord function.

Logs below:

ren  | Traceback (most recent call last):
ren  |   File "/data/venv/lib/python3.11/site-packages/discord/client.py", line 441, in _run_event
ren  |     await coro(*args, **kwargs)
ren  |   File "/data/cogs/CogManager/cogs/highlight/highlight.py", line 825, in onMessage
ren  |     await self.checkHighlights(msg)
ren  |   File "/data/cogs/CogManager/cogs/highlight/highlight.py", line 665, in checkHighlights
ren  |     elif await self.wordFilter.containsFilterableWords(msg):
ren  |          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ren  |   File "/data/cogs/CogManager/cogs/wordfilter/wordfilter.py", line 452, in containsFilterableWords
ren  |     filteredMsg = _filterWord(filters, filteredMsg)
ren  |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ren  |   File "/data/cogs/CogManager/cogs/wordfilter/wordfilter.py", line 640, in _filterWord
ren        return re.sub(regex, _censorMatch, string, flags=re.IGNORECASE)
ren  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ren  |   File "/usr/local/lib/python3.11/re/__init__.py", line 185, in sub
ren  |     return _compile(pattern, flags).sub(repl, string, count)
ren  |            ^^^^^^^^^^^^^^^^^^^^^^^^
ren  |   File "/usr/local/lib/python3.11/re/__init__.py", line 294, in _compile
ren  |     p = _compiler.compile(pattern, flags)
ren  |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ren  |   File "/usr/local/lib/python3.11/re/_compiler.py", line 743, in compile
ren  |     p = _parser.parse(p, flags)
ren  |         ^^^^^^^^^^^^^^^^^^^^^^^
ren  |   File "/usr/local/lib/python3.11/re/_parser.py", line 980, in parse
ren  |     p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
ren  |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ren  |   File "/usr/local/lib/python3.11/re/_parser.py", line 455, in _parse_sub
ren  |     itemsappend(_parse(source, state, verbose, nested + 1,
ren  |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ren  |   File "/usr/local/lib/python3.11/re/_parser.py", line 863, in _parse
ren  |     p = _parse_sub(source, state, sub_verbose, nested + 1)
ren  |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ren  |   File "/usr/local/lib/python3.11/re/_parser.py", line 455, in _parse_sub
ren  |     itemsappend(_parse(source, state, verbose, nested + 1,
ren  |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ren  |   File "/usr/local/lib/python3.11/re/_parser.py", line 685, in _parse
ren  |     raise source.error("multiple repeat",
ren  | re.error: multiple repeat at position 2848
quachtridat commented 1 year ago

Regarding the particular RegEx pattern concerned, is it a pattern that we inserted into the cog? If the problem doesn't take place programmatically, then I'd see it as a human error and we need to fix the problematic patterns. Am I misunderstanding the issue post here?

Injabie3 commented 1 year ago

This pattern was restored from a backup, so it was entered in by someone, but we should validate the pattern before adding it in.

Injabie3 commented 1 year ago

Might be better to handle this without raising an exception; could log it in the console instead so as to keep it working

quachtridat commented 1 year ago

This pattern was restored from a backup, so it was entered in by someone, but we should validate the pattern before adding it in.

Would constructing a re.Pattern object work? I'm not sure how we go about pattern validation. If anything, we should only put in good, valid patterns.

Might be better to handle this without raising an exception; could log it in the console instead so as to keep it working

Right. Some error log lines should definitely help.