MrDogeBro / content_filter

A basic but robust content filter for python.
MIT License
4 stars 1 forks source link

Indexes Offset By Non-word Characters #14

Open HeyITGuyFixIt opened 2 years ago

HeyITGuyFixIt commented 2 years ago

I noticed that when I check a string, and it returns a match, when I look at the indexes, the indexes don't match the original string. It actually looks like it is the indexes of the string without any whitespace (or at least any spaces).

For example, say I have this string, hello world censor hello world, and I am trying to match the word "censor". as_list will tell me that the match is at (10, 16). However, if I use those indexes to find the word to replace it, it returns d cens. So the indexes appear to be offset by 2 in this case, which correspond with the number of spaces before the word "censor" in the string.

HeyITGuyFixIt commented 2 years ago

Did some more testing, and it seems the indexes are offset by any nonword character.

MrDogeBro commented 2 years ago

That very well could be true as the module does eliminate items like spaces and repeated letters as a part of its filtration. I cannot test this right now but I will try to test it shortly and get back to you. Thanks.