areebbeigh / profanityfilter

A universal Python library for detecting and filtering profanity
https://pypi.python.org/pypi/profanityfilter
BSD 3-Clause "New" or "Revised" License
73 stars 25 forks source link

Regex characters in "bad words" are not escaped #25

Open ThrawnCA opened 10 months ago

ThrawnCA commented 10 months ago

"Bad words" containing characters significant in regular expressions are not escaped, which results in them not being detected correctly. For example, 13i+ch is actually looking for strings containing "13ich", "13iich", "13iiiiiiich", etc.

The logic to add word boundaries also does not take into consideration the possibility of having non-"word" characters at the start or end of the string. For example, @$$, even if it were escaped, has no word boundaries at all, and so \b@$$\b will only match if it is entirely contained within another word.

We have fixed this in https://github.com/qld-gov-au/profanityfilter/pull/1/commits/96324f8821f02487e5859fb39f52505b8c3b8f60